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Preface 


From the earliest days of AI, scientists have 
recognized that the representation of a task 
domain is a major factor in the success of 
symbolic processing systems. However, current 
AI systems are usually limited to the 
representations that are hardwired by their 
programmers. To develop truly intelligent 
systems, AI needs to endow its creations with 
the ability to select and even generate their own 
representations and problem formulations. This 
ability distinguishes human culture from animal 
learning: the plethora of languages, formalisms, 
and notations we invent to help us solve 
problems and enrich our lives. 

The pioneering work of Saul Amarel 
demonstrated that, in principle, it is possible for 
an automated system to change its 
representation of a task domain; and 
furthermore that the payoff in problem solving 
performance could be exponential. Studies of 
expert problem solving behavior confirm the 
importance of the ability to choose good models 
and representations. Since the mid eighties, 
research has greatly expanded in this field of AI. 
Various approaches have been developed, 
including automatic theory revision; automatic 
abstraction, approximation and refinement of 
task domains; and transfer of knowledge from 
one representational formalism to another. 
Prototype systems have been developed in areas 
such as automatic planning and programming, 
engineering design, analytical reasoning 
problems and qualitative physics. 

These are the proceedings for the third bi- 
annual workshop on change of representation 
and problem reformulation. The first workshop 
was chaired by Paul Benjamin in June of 1988 
and sponsored by Philips Laboratories at 
Tarrytown, New York. Paul Benjamin edited a 
collection of revised papers from the workshop 
and published the book Change of 
Representa tion and Inductive Bias through 
Kluwer Academic in 1989. The second 
workshop was chaired by Jeffrey Van Baalen in 
March of 1990 at Menlo Park, California. 
Facilities were provided by Price Waterhouse 
Technology Center, with grant money provided 
by ACM SIGART. The third workshop was 
held at the Asilomar conference center in Pacific 
Grove, California in late April of 1992. Grant 


money was provided by AAAI and ACM 
SIGART. 

In contrast to the first two workshops, this 
workshop was focused on analytic or 
knowledge-based approaches, as opposed to 
statistical or empirical approaches called 
'constructive induction'. The organizing 
committee believes that there is a potential for 
combining analytic and inductive approaches at 
a future date. However, it became apparent at 
the previous two workshops that the 
communities pursuing these different 
approaches are currently interested in largely 
non-overlapping issues. The constructive 
induction community has been holding its own 
workshops, principally in conjunction with the 
machine learning conference. While this 
workshop is more focused on analytic 
approaches, the organizing committee has made 
an effort to include more application domains. 
We have greatly expanded from the origins in 
the machine learning community. Participants in 
this workshop come from the full spectrum of 
AI application domains including planning, 
qualitative physics, software engineering, 
knowledge representation, and machine 
learning. 
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Abstract 

Planning will be an essential part of future au- 
tonomous robots and integrated intelligent sys- 
tems. This paper focuses on learning problem 
solving knowledge in planning systems. The sys- 
tem is based on a common representation for 
macros, abstractions, and cases. Therefore, it is 
able to exploit both classical and case-based tech- 
niques. The general operators in a successful plan 
derivation would be assessed for their potential 
usefulness, and some stored. The feasibility of this 
approach was studied through the implementation 
of a learning system for abstraction. New macros 
are motivated by trying to improve the operator- 
set. One heuristic used to improve the operator- 
set is generating operators with more general pre- 
conditions than existing ones. This heuristic leads 
naturally to abstraction hierarchies. This investi- 
gation showed promising results on the towers of 
Hanoi problem. The paper concludes by describ- 
ing methods for learning other problem solving 
knowledge. This knowledge can be represented by 
allowing operators at different levels of abstraction 
in a refinement. 

Introduction 

This paper advocates a common representation for op- 
erators that includes abstract plans, cases and macros 
[Baltes, 1991]. An important aspect of this represen- 
tation is that a system should be able to learn the 
necessary problem solving knowledge. 

The implementation of a prototype system that 
learns abstraction hierarchies is described. The learn- 
ing system tries to improve the operator-set by ex- 
tracting macros with more general pre-conditions than 
existing ones. This leads naturally to the generation 
of abstraction hierarchies. Rather than searching for 
new macros explicitly, the learner extracts new macros 
from a successful plan. It tries to find operators that 
result in identical states and that differ in exactly one 
pre-condition predicate. If such operators are found, 
the system deletes the differing predicate from the pre- 


conditions, thus forming an abstract operator. Since 
the planning system is intended to support case-based 
planning techniques, a generalized and an instantiated 
version of the macro is stored. We intend to use a novel 
dynamic filtering scheme [Baltes, 1991] to delete poor 
macros. 

The learning system was tested on the towers of 
Hanoi problem and showed promising results. The re- 
mainder of the paper is organized as follows: first, the 
paper presents a common representation for planners, 
then reviews previous work on operator learning with 
macros. Section explains how operators are learned 
in our representation. Then, a description of the im- 
plementation and an example are given. The paper 
concludes by describing how we intend to learn other 
problem solving knowledge such as reactive rules or 
anticipation of failure. 

Macro-Operators 

A linear macro is a sequence of primitive op- 
erators. This sequence is usually generalized 
and added to the operator set as a new op- 
erator. For example, useful macros in the 
blocks world domain are pickup=( goto, grasp) and 
putdown=(goto, ungrasp). Macros can be used 

in the construction of new macros, for example 
nova- (pickup , put down) . 

This paper focuses on linear macros because the for- 
mation of iterative or disjunctive macros depends on 
good linear ones [Shell and Carbonnel, 1989]. Macros 
speed up the planning process because they reduce the 
solution length. On the other hand, the generation 
of macros must be carefully controlled because new 
operators increase the branching factor of the search 
space. Minton showed that simply generating all pos- 
sible macros from a successful solution decreases per- 
formance [Minton, 1985]. 

Dynamic Filters 

As mentioned above, only a small number should be 
generated, ideally ones that will be useful in future 
problem solving tasks. The basic problem of macro 
learning is that the system has to predict the useful- 
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ness of a macro based on its previous experience. The 
following paragraphs describe the effect of adding one 
macro to the operator set and derive a usefulness mean 
sure for such an addition. This measure can be used 
to dynamically delete macros. 

By adding a macro m, the branching factor is in- 
creased. However, the new macro will not be appli- 
cable in all situations. Let 6 be the branching factor 
without the macro in question. Let c be the fraction of 
states where m is applicable. Furthermore, not all ap- 
plications of m will lead to a solution, so let u m be the 
usefulness of m, which is the overall chance of applying 
m to achieve a solution; the ratio of the total number 
of times m leads to a solution, to the total number of 
times any operator is applicable. If l m is the number 
of primitive operators in m, then the time complexity 
for the new operator set is of the order: 

(6 + c) n /'" u - 

The branching factor is increased by the applicability 
of m, and the effective solution length is reduced by the 
chances of using m, and in proportion to m’s length. If 
u m is 1, this means m is used at each step of the solu- 
tion, and the plan is divided in length by the number 
of operators in m. If planning is to be faster when a 
macro is added, then the following inequality must be 
satisfied: 


b n 

> 

(6 + c)"/'-“- 

(1) 

log b 

> 


(2) 


> 

1 log(6 + c) 

In, log b 

(3) 


The macro length predominates the generality of its 
pre-conditions, c, in 3. So it will be more important 
to allow long macros than more specific macro pre- 
conditions. However, the pre-conditions cannot be ig- 
nored; impractically large values are required for b to 
make the second fraction in 3 approach unity. Note 
that u m and l m are not independent; as the length 
grows the chances of the macro being used in a plan de- 
crease. Furthermore, it is also assumed that the space 
searched does not change with m. Under this assump- 
tion, 6 and c sire independent. 

Equation 3 cannot be used directly for selecting new 
macros because u m , 6, and c cannot be effectively com- 
puted a priori. However, these values can be approxi- 
mated statistically. After a number of trials, equation 3 
can be used as a dynamic filter to remove unnecessary 
macros. 

Iba [Iba, 1989] proposes dynamic filters in his 
MACLEARN system. However, the implementation 
seems ad hoc; the user determines when to call the 
dynamic filter routine, which deletes all macros that 
have not been used at least once in a previous problem 
solving episode. 

The utility measure is not based on the number of 
pre-conditions in a macro-operator (as it is done in 


Minton’s system), since as will be explained in the re- 
mainder of the paper, the number of pre-conditions 
does not increase in my system. 

Macros, Abstractions, and Cases 

This section suggests a common representation for 
macros, abstractions, and cases in a planning sys- 
tem. It will show the similarity and differences be- 
tween these methods, and suggest that a common rep- 
resentation will allow a problem solver to use all three 
strategies simultaneously. 

The proposed representation will enable a planning 
system to maintain important advantages of previous 
systems: 

• The planner will learn only when there is strong 
motivation, in order to increase performance in 
the future. This point has been shown by the 
MACLEARN system (flatten the search space, [Iba, 
1989]) and by the CHEF system (repair failed plans 
and anticipate problems, [Hammond, 1989]). 

• Proposed macros are filtered statically as well as dy- 
namically. A new heuristic described in equation 3 
is used. 

• The planner learns from a worked example (similar 
to MACLEARN, PiL2 [Yamada and Tsuji, 1989], 
CHEF) rather than using a brute force search to 
find new operators (which would be similar to MPS 
,[Korf, 1985]. 

• The planner should be able to use a heuristic func- 
tion or other knowledge that is available. 

Korf mentioned the similarity between abstractions 
and macros [Korf, 1987]. Both methods try to reduce 
the search by generating a skeleton search space of 
the original problem space. Instead of searching in 
the original space, a solution is found in the skeleton 
space and this solution is then refined into a solution in 
the original problem space. One difference, however, 
is that there cam be more than one abstraction level 
whereas macros normally generate only one skeleton 
space. 

Cases can be viewed as long, specific macros. The 
main distinction between macros and cases is the way 
in which they are used in a planning system. Cases are 
fetched from memory and some plan critics are applied 
to change the case to the new situation. Macros are 
usually not altered, i.e. the sequence of elementary 
operators is not adapted to the new situation. 

The common feature among all three items is that 
the most important information stored is a set of pre- 
conditions and a set of effects, as for elementary oper- 
ators. 

Abstraction hierarchies are equivalent to elementary 
operators that are missing some pre-condition pred- 
icates. This means that although the specific exe- 
cution depends on all pre-conditions, the effects can 
be achieved independently of the actual value of the 
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deleted predicates in the pre-conditions. One abstract 
operator can be specialized in a number of different 
ways. The representation can capture this by associ- 
ating a set of operators with pre-conditions and ef- 
fects. This structure represents that the effects can 
be achieved given that the pre-conditions are true, 
but that the instantiation of the plan may depend 
on predicates not mentioned in the pre-conditions. A 
method similar to PiL2’s perfect causality heuristic is 
used to generate new operators that depend on fewer 
pre-conditions. More than one level of abstraction can 
be represented by showing that elements of the refine- 
ment of an abstract operator can consist of abstract 
operators. 

Common representation 

In our common representation, shown in Figure 1, 
an operator is recursively represented as (a) a pre- 
conditions and effects pair, and (b) a set of refinements, 
each of which is an operator sequence. A primitive op- 
erator has no refinements, and can be executed. 

A variety of well-known problem solving knowledge 
is supported. An operator is like a macro if (a) there 
is only one refinement, (b) each operator in the re- 
finement is a primitive, and (c) pre-conditions and ef- 
fects predicate arguments are instantiated in neither 
the operator nor the refinement (i.e. the macro has 
formal parameters). A case is an operator with a re- 
finement that is a fully instantiated (long) sequences 
of primitives (i.e. an instantiated macro). An abstract 
operator at criticality level k has refinement /s whose 
operators are abstract ones at level k — 1. This rep- 
resentation supports relaxed (predicates deleted from 
pre-conditions, ABTWEAK) as well as reduced (pred- 
icates deleted in pre-conditions and effects, ALPINE) 
models of abstraction [Knoblock, 1991]. 

An operator is a subgoal sequence if all refinements 
contain no primitive operators (e.g. means ends anal- 
ysis). Anticipation of failure can be represented by an 
operator whose single refinement is a single operator 
pre-conditions, effects pair in which there is an ad- 
ditional effect (such as “avoid soggy broccoli”). This 
ensures that the planner knows about the problem, and 
the refinement of the failure anticipation operator will 
be expanded using the successful plan, which is also 
stored as an operator. Since our representation does 
not enforce a common level of abstraction for opera- 
tors in the refinement, a case or macro can also be 
generalized by making some operators non-primitives. 
This allows us to store adaptations of a case such as 
the replacement of some steps. Reactive rules may be 
represented as an operator whose single, two-operator 
refinement is a fully instantiated, primitive first opera- 
tor, followed by a non-primitive pre-conditions, effects 
pair. If this two-operator refinement is reversed, then 
the resulting operator is suitable for backward chain- 
ing from the goal (similar to RWM [Giivenir and Ernst, 
1990]). 


While this generality provides a common operator 
representation, it also presents the immediate problem 
of controlling the creation of operators, so that plan- 
ning is not impossibly expensive. We intend to control 
this using the dynamic operator deletion mechanism 
introduced above. In addition, the learning methods 
that add operators to the case memory must do so only 
when there is strong justification, and must choose 
“important” parts of new plans for storage, deciding 
the level of abstraction, number of refinements, and so 
on. This is the subject of current work. The common 
representation enables us to treat the various kinds of 
planning system in a single consistent framework, to 
better aid analysis and comparison. 

The planner may choose to “forget” the refinements 
of some operators, when their usefulness decreases. 
But the pre-conditions, effects pair is retained, and 
the details can be replanned if necessary. 

Planning using a common Representation 

This section indicates how one might use the operator 
representation given in this paper. The planner should 
combine case-based as well as classical planning tech- 
niques, to take advantage of both previous experience, 
and the ability to solve new problems. What is needed 
is a control strategy that recalls and uses previous ex- 
periences to solve similar new problems, but gracefully 
moves into classical planning if no similar cases can be 
found. The recursive structure of the representation 
lends itself well to a recursive control strategy. Learn- 
ing is designed to support and improve the planning 
process, by storing new operators. The planner should 
restrict the branching factor of the search space by fo- 
cusing on a small number of operators instead of all 
applicable ones. 

The input to the planner are initial state, goal state, 
and the operator set. Additional input is a resource 
limit and a skeleton plan agenda, which may support 
resource limited and multiple task planning. The plan- 
ning begins by matching and recalling stored operators 
that have pre-conditions and effects similar to the cur- 
rent state and goal. The refinement/s of these opera- 
tor/s will give various types of plans to be considered 
for solving the current problem. 

Recalling similar operators A stored, similar case 
may have additional pre-conditions or effects, or be 
missing some. Operators should be recalled when their 
pre-conditions and/or effects are similar to the current 
goal and initial state. Possible indexing schemes for 
recall can be based on the number of predicates in pre- 
conditions and effects, the predicates themselves, or 
combinations of predicates. 

Recalling is independent of the learned operator hi- 
erarchy; the fetched operator is not necessarily at the 
top level of a refinement tree. For example, if there is 
an abstract operator to move a medium disk indepen- 
dently of the small disk, and one refinement of this is 


3 


KEY: [Pre, Post] indicates a non-primitive operator, < Pre, Post > a primitive one, and <[Pre,Post]> either. <[Pre, Post]>* is 
an operator with predicates at criticality level k or above. 


General Operator: < [Pre n , Post,,] >, < [Preu, Poet 13 ] > . . . < 

[Pre, Post] — ► < [Preai, Postal ]> ( <[PreM,Post 2J ]>...< 

Preu„Posti.,]> 
Prea.j i Posta. 3 ]> 

< [Pre.,,1 , Postmi] >, < [Pre m a, Post mJ ] > . . . 
Each Preu =>■ Pre and each Posti., => Poet. 

< [Pre mB- , Post mBm ] > 

Macro (uninstantiated arguments) or Case (instantiated arguments): 
[Pre, Post] — •< < Pre, Poeti >< Prea, Post* > . . . < Pre,, Post > 

Abstract operator: 

< [Preu, Postu] <[Preia, Poeti*] >*"* . . . 

[Pre, Poet]* — ► < [Preai , Postal] >*”* , < [Preaa, Postaaj >*~* . . . 

<[Prei. t ,Postu ,]>* _1 
< [Preaa,, Post,,,] >*"* 

< [PiCmi. Postmi] >* -1 , < [Pre„a, Post»a] >* -1 . 

• • ^ [Pre mB- , Post m , B ] 

Automatic subgoaling: [Pre, Post] — * [Pre, Posti] [Pre*, Posta] . * • [Pre,, Post] 


Anticipation of failure: [Pre, Post] — ► [Pre, Poeti] 


Reactive rules: [Pre, Poet] — ► < Pre, Poeti > [Pre*, Poet] 

RWM-type operator subgoal: [Pre, Poet] — * [Pre, Poeti]< Pre*, Poet > 

Figure 1: The representation of planning operators. 


to move the medium disk when both are on the first 
peg, and if the current state is that both disks are on 
that peg, that refinement is retrieved, rather than a 
more abstract one. 

Adapting an existing plan Adaptation of a plan 
to new initial states and goal states is done by 
analysing the differences among the initial state and 
the pre-conditions and among the goal state and the 
effects of the similar plan. There are a number of gen- 
eral purpose adaptations to a plan that substitute one 
operator, listed below. If these don’t provide a com- 
plete plan, then we treat the adapted plan as a subplan 
and use means ends analysis to complete it. 

Replace Steps: An operator should be removed and 
steps inserted to achieve either pre-conditions or ef- 
fects. 

• Remove Side effect: If a plan fails because one 
operator has a specific side effect try to replace 
this operator with one that works. 

• Protect effect: A following operator destroys an 
effect of the solution. Tty to replace this operator 
with one that does not change the side effect. 

Substitution: Replace a variable instantiation with 
a different object (the operator sequence is un- 
changed). 

To find where to replace an operator, pre-conditions 
of the case that are not given in the current situation 


are pushed forward up to the first operator depending 
on those pre-conditions. Then the planner finds all 
elements of the effects that are dependent on this op- 
erator. If the effects are also part of the current goal, 
the system generates a new planning problem from the 
state just before the operator with the non-matching 
pre-condition to the first operator that uses any of the 
effects established. For example, if the goal is to have 
a barbecue and one of the pre-conditions is to have 
a match, this pre-condition is pushed forward to the 
operator nake-f irs. Since fire is a prerequisite for a 
barbecue, this effects is pushed backward to the op- 
erator put stsaks on firs which has has-f irs as a 
pre-condition. The system then tries to “improvise” 
and generate a plan using the state just before the op- 
erator Mks-firs to operator put stsaks on firs. 
Given that we have a lighter in the current state, this 
plan can be easily generated. The light natch opera- 
tor is replaced by the uss lighter operator. “Remove 
side effects” and “Protect effect” are specializations of 
the Replace operator strategy. 

If the non-matching pre-condition does not estab- 
lish a current goal predicate, the system tries to apply 
all operators of the plan, substituting variables where 
necessary (e.g. beans for broccoli). For example, given 
a plan to make a beef and bean dish from the ingredi- 
ents, and if the system returns a plan for a beef and 
broccoli dish, the pre-condition have broccoli does not 
establish any predicate in the current goal (beef and 
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beans dish). In this case, the system simulates the 
plan and uses beans instead of broccoli. 

After the case has been fixed to achieve all its ef- 
fects with the new initial conditions, the system tries 
to achieve missing goal conditions one by one, using 
means ends analysis. The first non-satisfied term of 
the goal conjunction is selected and a new planning 
problem is generated from the goal state of the case to 
the goal state of the original problem. 

The planner computes the subgoals that are neces- 
sary for the achievement of any of the adaptations or 
classical planning rules. It then retrieves similar cases 
for each of the generated subgoals and tries to work on 
them in order of similarity. This can also be used to re- 
pair failed plans, if the failed plan is stored in memory 
with a new effects added so that the failure is avoided. 

Learning General Operators 

Many researches have investigated different meth- 
ods for constructing macros [Korf, 1985; Korf, 1987; 
Minton, 1985; Iba, 1989; Yamada and Tsuji, 1989]. A 
comparison of those methods leads to the following is- 
sues: 

Generalized macros Korf’s MPS system [Korf, 
1985] stores instantiated macros, whereas Yamada’s 
PiL2 system [Yamada and Tsuji, 1989] and Iba’s 
MAC-LEARN system [iba, 1989] generalize macros, so 
that they are more widely applicable. Although gen- 
eralization of macros seems intuitive, it also increases 
the search space, especially if many objects exist in the 
domain. 

Worked examples The MPS system searches for 
macros to fill the table. In the worst case, this may 
prevent the algorithm from terminating, although a so- 
lution to the problem may exist. This can occur if MPS 
is trying to find an impossible macro. PiL2 and MAC- 
LEARN use a “worked example” to extract macros. A 
“ worked example” is either a successful plan or part 
of the search space that was searched when trying to 
find a successful plan. This means that no extra search 
effort is required for the generation of macros. 

Motivation The motivation behind the MPS sys- 
tem is to combine automatic subgoaling with macros. 
Macros are used to serialize a subgoal sequence by 
guaranteeing that the goal conditions of previous sub- 
goals are satisfied after the application of the macro, 
although they may be temporarily destroyed during its 
application. The heuristic used in the MAC-LEARN 
system is to generate macros between peaks in the 
heuristic evaluation function. This means that macros 
are used to flatten the search space of the heuristic eval- 
uation function so that valleys can be traversed faster. 
The PiL2 system uses the perfect causality heuristic. It 


extracts a macro from a successful plan if (a) the pre- 
conditions of an operator in the plan were not satisfied 
in the initial state, and (b) the pre-conditions of this 
operator were satisfied after the application of previ- 
ous operators. The motivation is to generate macros 
that allow the system to apply more operators to the 
initial state. 

Learning Abstraction Hierarchies with 
Macros 

Although a common representation is powerful, the 
manual generation of useful operators requires a large 
amount of domain knowledge and is tedious. Ideally, 
the planning system should learn operators from its 
previous experience. Therefore, a learning system was 
designed to create new operators for the representa- 
tion. There are two major motivations for the system 
to learn: 

Failure The system generates a plan that failed. 
Here, it tries to avoid generation of invalid plans in 
the future. Examples are anticipation of failure in 
case-based planning systems or explanation based 
learning rules. 

Success Given that the system generated a success- 
ful plan, extract information from this plan to speed 
up the process for similar goals in the future. The 
generation of macro-operators and automatic sub- 
goaling fall into this category. 

The “need to learn” is easily recognized in the failure 
driven approach. The system knows exactly when new 
information has to be added, that is exactly when a 
generated plan failed. The problem is to decide what 
information should be stored in order to avoid failure in 
the future. For example, should the fully instantiated 
problem be stored or a generalization of it. 

Learning in the success driven approach is harder, 
because the system must decide when to integrate new 
knowledge as well as what knowledge to integrate. For 
example, the MAC-LEARN system motivates learn- 
ing by trying to flatten the search space. In PiL2, a 
sequence of operators that are used only to generate 
pre-conditions of a following operator should be com- 
bined in one macro so that the operator can be applied 
to the initial state. 

This paper proposes two new heuristics for the gen- 
eration of macros. The motivation is that to improve 
performance, a macro learner must improve the op- 
erator set. A macro-learner only changes the oper- 
ator set, it does not tell the planner when to apply 
new operators. For example, it does not affect the 
heuristic evaluation function. Previous systems such 
as MAC-LEARN and PiL2, however, do not take the 
current operator set into consideration when learning 
new macros. There are two ways in which an operator 
can be improved. 
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Heuristic 1 Try to create new abstract operators 
that contain fewer pre-conditions than existing oper- 
ators, and identical effects. That means that certain 
conditions can more easily be generated. 

Heuristic 2 Try to introduce operators that have 
fewer effects than existing ones. This generates oper- 
ators with more specific effects, so that the planning 
system can affect the world more controlled. 

Heuristic 1 is more interesting because it generates 
an abstraction hierarchy of operators. This paper de- 
scribes the implementation of a macro-learner that 
uses only the first heuristic to find new macros. 

Implementation of the Macro-Learner 

The learning system described in this paper is an ad- 
dition to the AbTweak planning system implemented 
by Yang [Yang and Tenenberg, 1990]. The macro- 
learner generates a successful plan using AbTweak and 
extracts macros from it. 

Figure 2 is a pseudo code description of the algo- 
rithm used. Explanation based generalization (EBG) 
is a common technique for the generalization of a 
macro [Minton, 1985]. The problem is given a sequence 
of operators and variable instantiations to compute 
the weakest set of pre-conditions that still allow the 
achievement of its effects. 

Post-Conditions The post-conditions of an opera- 
tor are the set of facts that must be true after appli- 
cation of the operator. It is different from the effects 
of an operator because the effects only mention facts 
that are changed by the operator. However, the pre- 
conditions that are not affected must also be true after 
application of the operator. The post-conditions are 
equivalent to the effects plus all predicates in the pre- 
conditions that are unchanged. 

Logically Equivalent Descriptions The major 
problem in the implementation of the system is that 
there is more than one possible logical description of 
the world. For example, in the towers of Hanoi prob- 
lem, the states (not ons PsglKnot ons Psg2) and 
(ons Psg3) are equivalent because there are only three 
possible pegs and each disk must always be on a peg. 
The macro-learner extracts macros that have the same 
post-conditions but can be used to generate abstract 
macros. This means that the algorithm must establish 
the logical equivalence of world states. There are two 
possible solutions to this problem. 

The first method uses a resolution theorem prover to 
prove the equivalence of post-conditions. This method 
is the most general one. A set of axioms can be given 
that cam be used to prove facts about the domain. 
Since the operator set describes all elementary actions 
by which the world can be affected, it is also possible 
to derive certain facts used in the proof. For example, 


since the only operator that moves the small disk es- 
tablishes the fact that the small disk is on some peg, 
the system can derive the fact that the disk is always 
on some peg. 

The second method uses a unique description of the 
world. Two states are identical if and only if they have 
the same description. This can be achieved by chang- 
ing the representation of operators or by designing a 
set of domain dependent rewrite rules that change a de- 
scription dynamically. For example, a rewrite rule can 
be used to convert (not ons PsglKnot ons Peg2) 
to (ons Psg3). 

This projects focuses on the feasibility of learning 
abstract operators rather than the design of a practi- 
cal planning system. Therefore, the standard descrip- 
tion language of operators was changed to generate a 
unique description of all world states. For example, 
the standard definition of the operator to move the big 
disk in the towers of Hanoi problem is 

nov#-big(lX IZ) 

Prs: (onb IX) (not ons $X) (not ons |Z) 
(is-peg IX) (is— peg IZ) 

Post: (onb IZ) (not onb IX) 

This definition was changed to allow unique descrip- 
tions of post-conditions. Therefore, the novs-big op- 
erator was defined as follows: 

aovs-big(lX IT $Z) 

Prs: (onb IX) (ons IT) 

(is-peg IX) (is-peg IT) (is-peg IZ) 
Post: (onb IZ) (not onb IX) 

This means that all operators must reference all 
three pegs in their argument list, and that all facts 
are represented directly instead of indirectly. 

Example 

This section shows an example of the macro-learner in 
the towers of Hanoi domain with two disks. There are 
three reasons for selecting this problem. 

• It was easy to find a representation of operators that 
resulted in a unique logical description of the world. 

• The problem is well studied and comparison to other 
planners can be made. Also, the optimal solution for 
this problem is known. 

• The structure of the problem is well suited to ab- 
stract operators. In fact, abstract operators can re- 
duce its time complexity to be linear in the length 
of the solution. 

In the initial state, both disks are on the first peg. 
The goal is to move both disks on the third peg. The 
standard operator set is changed to use a unique logical 
description and is represented by: 


Macro-Lsarasr(Plan, Opsrator-Sst) 

Conputs all possible Macros in ths plan. 

For t&ch Macro in tha plan and aach oparator do 
nacro-gsn :* EBG(nacro,op) 
world :* post-conditions (nacro-gsn) 
if world s post-conditions (op) than 

if prs(nacro) and pra(op) differ in ona pradicata 
ab :* craata-abstract-oparator (Macro, op) 
link(ab,op) 

1 ink (ab, macro) 

operator-sat := add-operator (ab, oparator-sat) 
return ( oparator-sat ) 


Figure 2: Macro-Learner Algorithm 


nove-s($X $Y $Z) 

Pra: (ispag $X)(ispag $Y)(ispeg $Z) 

(ons IX) 

Post: (not ons $X)(ons $Z) 
move-b($X $Y $Z) 

Pra: (ispag $X) (ispag $Y) (ispag $Z) 

(ons $Y) 

(onb $X) 

Post: (not onb $X)(onb $Z) 

AbTweak is used to find a solution for this 
problem which is the sequence novs-s(Pegl,Psg2) , 
aove-b(Pegl ,Peg3) , novs-s(Peg2,Peg3). From 
this solution three macro sequences can be extracted. 

seq-1: nove-s(Pegl ,Pag2) ,novs-b(Pegl,Peg3) 
saq-2: novs-b(Pegl ,Pag3) ,novs-s(Peg2,Peg3) 
saq-3: MOV#-s(Pegl ,Peg2) ,novs-b(Pegl ,Pag3) , 
mowa-s (Peg2 , Pag3 ) 

From the first sequence seq-l, the following macro 
can be generated after using EBG to compute its pre- 
conditions and effects. Operator Macro 1 is not changed 
when generalizing with the original operator nov*-b 
because neither restricts the variable instantiations, 
nacrol ($V1 $V2 $V3) 

Pra: (is-pag $V1) (is-pag $V2) 

(is-pag $V3) 

(ons $V1) (onb $V1) 

Post: (ons $V2) (onb $V3) 

(not ons $V1) (not onb $V1) 

The post-conditions of Macro 1 and novs-b are iden- 
tical (except renaming of variables) and are given by 
the following set of facts: 

(is-pag $V1) (is-pag $V2) (is-pag $V3) 

(ons $V2)(onm $V3) 

The algorithm then compares the pre-conditions of 
nacrol and mova-b. The pre-conditions differ only in 
the predicate ons which is (ons $V1) for nacrol and 
(ons $V2) for novs-b. Therefore, the macro-learner 


constructs an abstract operator in which the ons pred- 
icate is deleted. This abstract operator is generalized, 
and it contains two refinements: mova-b and nacrol. 
However, in order to avoid unnecessary variable instan- 
tiations, the linked macro is fully instantiated. In that 
way, if the same problem has to be solved in the future, 
the variables do not need to be re-instantiated. How- 
ever, the abstract operator shows the generalization 
that is possible. Figure 3 shows the resulting operator 
hierarchy. 

Since only nacrol and mova-b have identical post- 
conditions, the abstract operator in figure 3 is the only 
new operator that is added to the operator set. 

Evaluation 

With the implementation of the macro-learner, we 
tried to establish the usefulness of our first heuristic (to 
improve the operator set by generating new operators 
with more general pre-conditions). It is interesting to 
note that this heuristic leads to an abstraction hierar- 
chy for the two-disk towers of Hanoi problem that is 
identical to the one shown to be optimal by Knoblock 
[Knoblock, 1991]. If the planner uses a control strategy 
that supports abstractions, the time complexity grows 
only linearly with the length of the solution [Knoblock, 
1991]. 

Good performance of the planning system with the 
new operator set was expected, because the optimal 
set of abstractions was generated. This was verified in 
a number of experiments where the solution time was 
reduced from 40 to 24 seconds on a Sparc 2 station. 
It took ten seconds to compute the abstract operators. 
Similar results were obtained for the problem with the 
initial state (ons psg3)(onb psgl) and the goal state 
(onb psg3) (ons p*g3). 

The macro-learner was also tested on the towers of 
Hanoi puzzle with three disks. The results of these 
experiments were similar to the ones for the previous 
example. The macro learner created two abstract op- 
erators: 

• move the medium disk ignoring the small disk. 
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[Abstract-1 ($X $Y$Z)] 

{Move big disk, ignore small disk} 


<move-s($X $Y $Z)> 
{move small disk} 



<move-b($X $Y$Z)> 

{move big disk, small is on peg $Y} 

<move-s(Peg1 Peg3 Peg2)xmove-b(Peg1 

{move big disk, small is on pegl , big is on peg 1} 


Peg2 Peg3)> 


Figure 3: Learned Operator Hierarchy for the Towers of Hanoi 2 


• move the large disk ignoring the medium disk. 

These abstract operators are learned after only one 
successful plan is generated. After solving the prob- 
lem a second time, the abstract operator learned to 
move the large disk regardless of where the medium 
and small disks are. These two abstract operators to- 
gether with the primitive operator to move the small 
disk form the optimal operator set for the towers of 
hanoi problem with three disks. The solution time de- 
creased from 2658 seconds to 29 seconds. The compu- 
tation time for learning the abstract operators was 81 
seconds. This result suggests that the learning time 
scales up much better than the planning time. 

The most interesting result of the towers of Hanoi 
problem with two and three disks was that the system 
learned the optimal set of abstractions. This means 
that it not only learned the correct number of abstrac- 
tion levels, but also the correct number of abstract op- 
erators for each level. Previous systems such as MPS, 
MAC-LEARN, and PiL2 are unable to learn these ab- 
stract operators. 

Learning other Problem Solving 
Knowledge 

This section describes methods for learning other prob- 
lem solving strategies that can be represented in our 
common representation. One of the main advantages 
of a common representation is that not all operators in 
a refinement have the same level of abstraction. There- 
fore, other strategies such as reactive rules can be used. 
These strategies can be learned by comparing all refine- 
ments of a general operator. 

After learning a new operator, the system uses addi- 
tional heuristics to incorporate the new operator into 
the existing operator set. All new macros are part 
of the refinement of a more abstract operator (or the 
original initial state, goal state pair). The system com- 
pares the new operator to all other refinements of its 
abstract operator. 


Raising Operators First the system tries to extract 
operators that occur in all refinements. The common 
operators are “raised” in the abstraction hierarchy, so 
that the planning system can focus on those operators. 
For example, given the abstract operator in figure 3, 
the operator to move the big disk is common in both 
refinements. In this case, the abstraction hierarchy is 
changed to reflect the fact that the operator aovs-b is 
an essential part of moving the big disk. The resulting 
abstract operator is similar to the RWM type operator 
subgoals [Guvenir and Ernst, 1990]. If the common 
operator occurs at the beginning of the sequence, a 
reactive rule is formed. 

Generating Abstractions from Subsequences 
The system also extracts equivalent post-conditions 
resulting from the execution of operators in all refine- 
ments. If matching post-conditions are found, these 
states are extracted as new abstract operators. For ex- 
ample, assume that there are two operators to move the 
big disk, noY«-bl and ■ovs-b2. Furthermore, in fig- 
ure 3, the system used novs-bl in the first refinement 
and sovs-b2 in the second refinement. In that case, 
there are no common operators in all refinements. Nev- 
ertheless, common to all refinements is a state where 
the small disk is on peg IT. Therefore, the original 
problem is broken up into two abstract operators. The 
first one moves the small disk onto the medium peg, 
the second abstract operator moves the medium disk. 
The resulting operator hierarchy is identical to a sub- 
goal sequence. 

Failure When the system generates an unsuccessful 
plan, some of its expectations are wrong. If the user 
provides the system with additional information ex- 
plaining why the plan failed (e.g. (problea.x)), the 
system can generate an abstract operator that relates 
the original problem to an elaboration of the prob- 
lem, where the effects have additional conditions, (e.g. 
(not probleu.x)). In the future, the planner is re- 
minded of this problem and can avoid it. 
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Conclusions 

The major contribution of this paper is the design of 
a learning system for a planner that combines macros, 
abstraction hierarchies, and case-based planning. The 
advantage of this approach is that techniques from 
both classical planning and case-based planning can 
be combined in the problem solving process. 

The paper describes an analytical dynamic filtering 
scheme used to rule out inefficient operators. The dy- 
namic filter is based on a formula relating the empirical 
usefulness and the length and branching factor of the 
operator. The common representation means that the 
dynamic filter can be applied to abstract operators and 
cases as well. 

The paper also compares three previous approaches 
to the selection of new macros: KorTs MPS, Iba’s 
MAC-LEARN, and Yamada’s PiL2 system. From this 
comparison, guidelines are suggested for the selection 
of new operators. The goal in creating a new operator 
is trying to improve the current operator set. There 
are two ways in which an operator can be improved: 

• Create an operator with more general pre- 
conditions. The effects of this operator can then 
be achieved in more states. The removal of pred- 
icates in pre-conditions leads to the generation of 
abstraction hierarchies 

• Create an operator with more specific effects. This 
removes side-effects of existing operators. 

A macro learner was implemented and tested on a 
number of problems in the towers of Hanoi domain. As 
important parts of the complete planning systems are 
still missing, the implementation focused on comparing 
the learned macros to the ones learned by other sys- 
tems. The results of the towers of Hanoi domain are 
promising.The system learned the optimal set of ab- 
stract operators for the two and three disk problems. 

The paper also describes methods to learn diverse 
problem solving knowledge such as reactive rules, and 
automatic subgoaling. The next step in our research 
is the implementation of a complete planning system 
that incorporates these methods. 

The paper also compares three different approaches 
to the selection of new macros. From this compari- 
son guidelines are suggested for the selection of new 
operators. The main motivation for these heuristics is 
to find widely applicable operators with very specific 
effects. 
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Abstract 

This paper is a research summary on our work re- 
garding the interaction of abstraction and search 
control heuristics. This work involves the least- 
commitment planner SNLP, an abstraction genera- 
tion system similar to ALPINE to automaticly gen- 
erate problem reformulations, and a system similar 
to STATIC to generate search-control heuristics. 
We describe an elegant way to make SNLP shift 
between representations automaticly. 

Introduction 

We are performing research on the interaction of ab- 
straction and search control heuristics. To get this 
research started we have developed a lifted version of 
McAllister’s partial order planning algorithm [6], called 
SNLP. Extending an incomplete plan in SNLP involves 
choosing what part of a plan to extend, and how to ex- 
tend it. Choosing what to extend affects the structure 
of SNLP’s search space, and choosing how to extend 
defines how the search space is traversed. 

Previous work on the interaction of abstraction and 
search control rules acquired through EBL [5] used 
PRODIGY. PRODIGY is a total order planner that 
commits to planning decisions much earlier than SNLP. 
There are many differences between PRODIGY and 
SNLP. These differences cause problems as we adapt 
ALPINE [4] and STATIC [2] to generate abstractions 
and control rules for SNLP instead of PRODIGY. 

This paper starts with a description of SNLP and 
shows how a planning decision can be delayed. The sec- 
ond part of the paper discusses strategies for deciding 
when to consider which planning decision. The third 
part discusses two search control heuristics for guiding 
a decision. The last part discusses the status of our 
research. 

The Planner 

As part of our research into step ordering commitment 
strategies when planning [1] we created a least com- 
mitment planner using a lifted version of McAllester’s 


Algorithm [6], Like many of the planners created since 
Sacerdoti’s NOAH [7], our planner searches through a 
space of pirns to find a plan that solves a STRIPS plan- 
ning problem. 

defmitiou 1 A PLAN is a triple: -<5,0, £>- in which 
S denotes a set of plan steps (also known as actions), 
O denotes a set of ordering constraints that specify a 
partial order on S, and B denotes a set of binding con- 
straints over the variables mentioned by the steps in S. 

The search starts with an initial plan which encodes 
the STRIPS planning problem to be solved. This plan 
consists of two steps s 0 and Sqo • The step $q adds the 
problem’s initial conditions, and Sqo requires the prob- 
lem’s goals (as preconditions). The algorithm that de- 
fines the search appears in figure 1. The algorithm is 
invoked with a plan P , a set of open goals G, and a 

set of causal links £. A causal link Si-^+Sj is a triple 
denoting that a step 5,- adds a proposition p to fulfill a 
precondition of step Sj . The set of open goals in G are 
the preconditions of steps 5 ; that do not have associ- 
ated causal links 5,-^5; in L . In the initial invocation P , 
G, and L are the initial state, the preconditions of s <*, , 
and the empty set respectively. The algorithm works 
by solving open goals in G while protecting causal links 
in L from the effects of other steps. 

This planner is provably complete for breadth-first 
and IDA* searches. Its least-commitment nature pro- 
vides opportunities for reordering the consideration of 
different planning decisions. Step 2 can select any open 
goal at any time. A related algorithm provides even 
more flexibility in that it can perform causal link pro- 
tection at any time in the planning process. Causal-link 
protection need not appear in step 5. 

Deciding What to Refine 

Our focus, for now, is on step 2. The order in which 
open goals are solved affects the difficulty of finding a 
solution. For instance, we implemented G as a stack 
and as a queue. For some of our experimental domains, 
the stack was exponentially worst than the queue. The 
reverse was true for other experiments. 
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Algorithm: SNLP(P,G,L) 

1. Termination: If G is empty, report success and 
stop. 

2. Goal selection: Let c be a proposition in G, and let 
Sneed be the step for which c is a precondition. 

3. Operator selection: Let S a dd be an existing, or 
new, step that adds c. If no such step exists and 
none can be added then terminate and report fail- 
ure. Let V — L U {SaddASn^ed}- Backtrack point : 
All existing and addable steps must be considered for 
completeness. 

4. Update goal set: Let G' = (G - {c}) U the set of 
preconditions of the new step if one was added. 

5. Causal link protection: For every step s k that 
might affect a causal link Si^+Sj € L': 

• Protect Si^$j from s k by adding constraints be- 
tween the 3 steps involved. Backtrack point: All 
ways to protect from s k must be considered 

for completeness. 

Let P* be the resulting plan. 

6. Recursive invocation: SNLP(P’,G’,L’). 

Figure 1: The Planning Algorithm SNLP 


Strategies for deciding which goal to handle next have 
a profound effect on the structure of the resulting search 
space. This is because delaying a planning decision in- 
creases the amount of information available when the 
decision is finally made. In the best case there would 
be enough information to force the decision. The op- 
timal strategy would maximize the amount of relevant 
information and reduce the branching at each of the 
backtrack points. If the number of branches can be re- 
duced to one at each point, then planning becomes triv- 
ial. Unfortunately, it is often the case that we cannot 
hold the number of branches to one, and determining 
the goal to select for reducing the branching factor is a 
nontrivial problem. 

One strategy we have experimented with involves as- 
signing a static priority value to each possibly open 
goal before starting the planning process. This is essen- 
tially the “abstractness” number mentioned by McAllis- 
ter [6]. One way to define these priorities uses ALPINE. 
ALPINE was originally developed for PRODIGY, but 
it maps very easily into SNLP. The effect of using 
ALPINE ensures that a causal link Si-^+Sj is never 
threatened when solving an open goal q when the pri- 
ority of p is greater than q . This is due to ALPINE’s 
ordered monotonicity property. This property has in- 
teresting effects when considering the causal structure 
of a plan. 

The causal links in the set L can be thought of as 
defining a plan P’s causal structure [10, 8, 3, 11]. A 
causal support for some proposition P is a minimal set 
of causal links, illustrated in figure 2 as arrows, between 


steps, illustrated as circles. There is a link to provide 
P, and every step mentioned in the causal support has 
all of its preconditions provided by links in the causal 
support. 



Figure 2: A causal support’s structure. 


Using ALPINE to assign priorities to open goals, our 
planner builts abstract causal supports for high-priority 
open goals, and then refines them as the priorities of 
the pending open goals in G decrease. Unfortunately, 
even though high-priority links never get threatened 
when low priority open goals are being solved, exist- 
ing steps can threaten the causal links created while 
solving low-priority goals. Sometimes a new link can- 
not be protected from these threats. Such cases show 
that ALPINE does not give us the downward solution 
property [4]. This property states that if an abstract 
plan can be found, then it can always be refined into a 
less abstract plan [9]. ALPINE does give us the down- 
ward failure property. This ensures that if an abstract 
plan cannot be found, then a concrete plan cannot be 
found either. 

Currently, our algorithm protects causal links as they 
are threatened. This is not necessary. It may not even 
be desirable since there are a huge number of ways to 
protect a causal link when propositions can have un- 
bound variables. For example, there are 5 different sets 
of constraints that can be added to a plan to protect 
Si^+Sj (where p is (on ?x ?y)) from a step s k that 
deletes (on ?a ?b). They are: 

1. {s k before s, } 

2. {sjb after sj] 

3. {?x ^ ?a, ?y rfc ?b, and s k between Si and Sj } 

4. {?x ^ ?a, ?y = ?b, and s k between Si and Sj} 

5. {?x = ?a, ?y ^ ?b, and s k between s, and Sj} 

If causal link protection was delayed until all variables 
were bound, the threat might have taken care of itself 
as a side effect of other planning decisions. If the threat 
still exists when all variables are bound, then the last 
three branches can be avoided altogether. 

Deciding How to Refine 

Step 3 of the original algorithm is very simplistic in that 
it tries to refine an open goal in every way possible. 
For example, in the blocks world domain, to solve the 
goal (on A B) the original algorithm might generate 
the sequence of steps: 


11 


(move A B) > (move A C) > (move A B) 

We are implementing techniques from STATIC to 
avoid generating these sequences. STATIC takes a do- 
main description and generates a set of control rules 
for deciding how to resolve a goal taking into account 
the goal’s purpose and the other steps currently in the 
plan. Like ALPINE, STATIC was originally created 
for PRODIGY. Unfortunately, problems occur when we 
try to apply it to improve SNLP. For instance, one of 
the problems that STATIC detects and generates rules 
to avoid is due to a concept called goal stack cycles. 
The example above is an instance of such a cycle for 
PRODIGY. 

Unfortunatly, goal stacks are a artifact of the way 
PRODIGY plans. SNLP does not have them, but it 
has a related concept called a goal’s purpose . This pur- 
pose is defined to be a list of propositions created by 
looking at the path of causal links that starts at the 
goal’s associated step and ends at the step Soq. Sev- 
eral examples of such paths appear in figure 2. At first 
glance SNLP’s goal purposes seem to be identical to 
PRODIGY’S goal stacks, but such is not the case. A 
PRODIGY goal has only one goal stack, but since there 
can be several paths from a step to Sqo, an SNLP goal 
can have more than one purpose. Also, as new causal 
links are added to a plan, the number of purposes for a 
goal increases. 

We will use a system similar to STATIC to create a 
set of control rules. A goal’s purpose specifies a set of 
control rules that apply in solving the goal. Since a goal 
may have more than one purpose, more than one set of 
control rules may apply. Each control rule that applies 
specifies some plan modification. STATIC/PRODIGY 
also matched conditions of a control rule against condi- 
tions provided by existing steps in order to further re- 
strict branching. We cannot do this in SNLP and keep 
completeness. Also, as a goal’s number of purposes in- 
creases, more control rules apply. This can happen after 
SNLP actually attempts to solve a goal. We have not 
figured out the best way to handle this yet. 

Our planner performs a best first search over the 
planning search space. One of our ranking functions 
sums the number of causal links with the number of 
open goals. This gives us an A* search that prefers 
plans with fewer causal links. We hope to use the prob- 
lem space graphs generated by STATIC to provide a 
better admissible ranking function. 

Research Status 

Currently we have implemented the planner, and a very 
simple version of ALPINE, which assigns a priority to 
a proposition’s predicate name. We quickly discovered 
that we need a more refined priority function. One way 
to get this function is to observe that some goal propo- 
sitions can never be added by an action. We can use 
these propositions to replace a small set of highly vari- 
ablized action descriptions with a larger set of actions 


in which some of the variables are bound to constants. 
From this new set we can generate a set of priorities 
for propositions where some of the variables have been 
bound to constants. We have not implemented this yet. 
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ABSTRACT 

This paper describes an investigation into 
the structure of representations of sets of 
actions, utilizing semigroup theory. The goals 
of this project are twofold: to shed light on the 
relationship between tasks and representations, 
leading to a classification of tasks according to 
the representations they admit; and to develop 
techniques for automatically transforming 
representations so as to improve 
problem-solving performance. A method is 
demonstrated for automatically generating 
serial algorithms for representations whose 
actions form a finite group. This method is 
then extended to representations whose actions 
form a finite inverse semigroup. 

Introduction 

This paper describes an algebraic approach 
to building systems that can automatically 
change their representations. Representation 
change, also called reformulation, has long 
been recognized as an essential component of 
intelligent systems (Amarel, 1968) (Simon, 
1969), but the automation of representation 
change has proved elusive. The understanding 
of representations and their properties lags far 
behind the understanding of search methods 
and their properties. This difference is 
reflected in the structure of AI programs: most 
contain a large number of search methods 
acting on a single representation. This was 
true for GPS, and remains true today, e.g., 
SOAR, Prodigy, and automated theorem 
provers, which typically possess a multitude of 


variants of resolution acting on a 
representation in normal form. 

This paper attempts to begin to rectify this 
situation, with a formal investigation of the 
properties of representations, and algorithms 
for representation change. This paper does not 
examine representation changes that are 
heuristic or inductive (these have been 
investigated by a large number of researchers 
in machine learning), but rather deductive 
reformulations that preserve logical soundness 
and completeness: no solvable problems are 
rendered unsolvable, nor are unsolvable 
problems rendered solvable. 

Deductive reformulations are much less 
well understood than heuristic or inductive 
transformations. In this type of reformulation, 
representations are not changed to alter their 
logical properties, but are changed to improve 
their computational properties, especially their 
search and input characteristics. As we will 
see, these computational properties can be 
well characterized algebraically. 

Representations 

It is well understood that representation 
selection sets the stage for both problem 
solving and learning, and that the choice of 
representation can greatly affect the cost of 
both. The examples in the next section will 
illustrate that the proper choice of 
representation is data-dependent, so how can a 
system know the best concept language for the 
data before seeing the data? 
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This leads to a problem: a system must choose 
a representation before it can know what 
representations would be good. 

In AI practice, this problem is resolved by 
the humans who develop the system. They 
have prior knowledge of the classes of tasks 
that the system will face and the demands that 
will be placed on it, and they engineer a 
representation whose properties will aid the 
system in meeting these demands. 

This has led to the current situation in AI: 
research is concentrated almost exclusively on 
development of planning and learning 
algorithms, and these algorithms are cast as 
search in problem spaces and concept spaces, 
respectively. The search through the space of 
representations is performed by skilled 

humans. Although research on planning and 
learning algorithms is certainly important, the 
neglect of research on reformulation has led to 
three major limitations of AI research. 

First, a wide variety of truly autonomous 
systems cannot be constructed as long as 
skilled humans are required to engineer the 
representations for the systems. It will only 
remain possible to build expert systems that 
can function in small, static domains, in which 
the representational demands on the system do 
not change over time. This limitation has 
special significance for the application of AI 
techniques to robotics. 

Second, the dependence of planning and 
learning algorithms on the properties of the 
representation is unstated in AI papers, 
thereby raising questions about the validity of 
the conclusions drawn about the properties of 
the algorithms, as it is unclear how to separate 
the properties of the algorithms from those of 
the representations. This leads to the 
unsettling possibility that researchers may 
have (subconsciously) engineered 

representations that cause planning or learning 
algorithms to perform well. If true, research 
results would be irreproducible (as other 
researchers might engineer different 
representations), and the underpinnings of AI 
as a science would we weakened. 


Third, this leads to very narrow 
conceptions of problem solving activities. For 
example, research in planning has focused on 
algorithms that construct a set of behaviors for 
the agent to exhibit for a particular task. These 
behaviors may be organized so that different 
behaviors are executed dependent on runtime 
conditions in the environment, but the 

limitation is that planning has been conceived 
as the process of constructing this set of 
behaviors. This conception has been 

challenged by recent work (Agre & Chapman, 
etc.) which argues that in complex domains the 
number of behaviors necessary for a successful 
plan is r> iurge to construct before execution. 
Instead. system plans by designing a 

program ;!:at will generate at execution time a 
behavior to attain the goal. In this new 
conception, planning becomes a design 
process, consisting of repeated cycles of 
design and performance testing. The design 
steps consist of both representation design and 
algorithm design, and the performance testing 
is the actual execution. In this way, the 
planner is constantly redesigning the program 
(if necessary) during execution. (Note that the 
bees of Agre & Chapman or the robot insects 
of Brooks are programs that are designed by 
humans, so the planning was done , by the 
humans.) The classical conception of planning 
as constructing a set of behaviors is a special 
case, when the program simply consists of the 
actions to perform in various situations. 

Similarly, research in learning has 

primarily focused on algorithms for 

constructing a new hypothesis from an existing 
hypothesis, given a set of new examples. But, 
beginning with the work of Mitchell and 
Utgoff, machine learning researchers began to 
realize the importance of representation 
design. It is by now widely recognized in the 
machine learning community that the design of 
the hypothesis language is crucial in efficient 
learning: the language must be restricted to 
permit the system to successfully identify a 
hypothesis without having to see all the 
possible cases, but choosing a sublanguage 
that doesn't contain any good hypotheses will 
lead to failure no matter what learning 
algorithm is used. Recently, conferences and 
workshops have been held on this topic, and 
books have begun to appear. In this respect. 
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the machine learning community is ahead of 
the planning community. Researchers in 
planning should note that Amarel’s seminal 
paper dealt with reformulation in problem 
solving, not in learning. 

As a result of these three limitations, we 
are led to the conclusions that representation 
change is a necessary capability of any 
autonomous intelligent system, and that AI 
needs a fuller understanding of representations 
and their properties. In the next section, we 
consider the types of properties that 
characterize good and bad representations, to 
understand the goals of reformulation. 

Computational Properties of 
Representations 

Representations vary as to the amount of 
search they require, the input they require, 
their memory usage, etc. Similarly, agents 
vary as to their memory, sensory, and motor 
capabilities. Tasks have constraints on usage 
of various resources, e.g., time. We have 
argued that only agents with the capability to 
change representations can select a 
representation whose characteristics are 
appropriate for the particular task at hand. For 
example, the agent may not need to find a 
globally optimal solution, but only one that 
meets certain criteria; in this case, the agent 
may be able to simplify the task description, 
and find an acceptable solution more quickly. 

In order to investigate the relevant 
properties of representations, we must first 
choose the appropriate tools. Virtually all of 
the knowledge representation community uses 
the tools of logic to investigate the properties 
of representations. Certainly, the soundness, 
completeness, and complexity of a 

representation are important properties; 
however, in this paper we are concerned with 
the computational properties of a 

representation, rather than whether it models 

the task environment accurately in all cases. 
The computational properties of a 

representation are independent of soundness, 
completeness, or complexity. To see this, 
consider the following two representations of 
the two-disk Towers of Hanoi: 


Representation TOH1: Let us number the 
nine states of the 2-disk Towers of Hanoi: 



6 8 9 

LLA JAl 


Let the two possible actions be denoted by 
"x" and "y". "x" moves the small disk right one 
peg (wrapping around from peg 3 to peg 1), 
and "y" moves the large disk one peg to the 
left (wrapping around from peg 1 to peg 3). In 
the figure, "x" is shown by narrow, 
counterclockwise arrows, and "y" is shown by 
thick, counterclockwise arrows. 

Representation TOH2: Let the states be 
numbered in the same way, and let the six 
possible actions be: 

XI = move the top disk from peg 1 to peg 2 

X2 = move the top disk from peg 2 to peg 3 

X3 = move the top disk from peg 3 to peg 1 

Y1 = move the top disk from peg 1 to peg 3 

Y2 = move the top disk from peg 2 to peg 1 

Y3 = move the top disk from peg 3 to peg 2 

These two representations are both sound 
and complete. Furthermore, they have exactly 
the same complexity, as they have the same 
number of states and possible actions in each 
state. The only difference is in the labeling of 
the actions. 

Yet, these two representations have very 
different computational properties: the first 
representation decomposes: it has a subgoal 
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reduction, whereas the second has none. This 
decomposition permits an agent to solve each 
subgoal independently, and then compose the 
solutions to form a solution to the task. In this 
case, the set of actions decomposes into 
actions for moving the larger disk and actions 
for moving the smaller disk, permitting the 
system to first bring one disk to its goal 
position and then bring the other disk to its 
goal position without disturbing the position of 
the first disk. As we will see, it is possible to 
solve either disk first and then the other. 

Certainly it is possible for a system using 
the second representation to bring one disk to 
its goal position and then bring the other disk 
to its goal position; obviously, it must do so to 
solve the task. However, there is no structure 
in this second representation of actions that 
can be used to find this decomposition, i.e., 
the actions do not admit a subgoal reduction. 

Thus, we see that a good representation 
facilitates problem solving by structuring the 
knowledge in a way that helps the agent to 
identify relevant actions - the actions for the 
first subgoal. We also see that we cannot 
characterize this structure by considering 
soundness, completeness, or complexity. This 
approach is consonant with the ideas of Doyle 
& Patil (1991), who argue that "logical 
soundness, completeness, and worst-case 
complexity are inadequate measures" for 
evaluating representations. We are therefore 
led to consider an alternative formal method of 
chararacterizing the structure of sets of 
actions. One of the primary purposes of this 
paper is to show that the tools of algebra are 
well suited to this purpose. 

In particular, the method used in this work 
is to apply the theory of semigroups to the 
analysis of representations of actions, to yield 
both an intuitive understanding of 
representations and algorithms for 
reformulation. The theory of semigroups is 
important in the study of algebraic linguistics 
(Chomsky, 1957), (Lallement, 1979), so it is 
not surprising that it can prove useful in the 
study of the languages used to represent tasks. 

This paper describes only the semantics of 
representation change, i.e., it examines the 
structure of sets of actions. The various 


symbolic encodings of each such structure in 
terms of state description functions is 
agent-dependent and deserving of a separate 
treatment, and so will be examined in a 
subsequent paper. 

A Prototypical Example of 
Reformulation 

To get a more intuitive feel for the issues 
involved in reformulation, let us first consider 
a familiar example. When we are posed the 
problem of finding the volume of a cylinder in 
an arbitrary position, the first thing we do is 
ci: inge the coordinates of the problem so that 
an axis passes lengthwise through the middle 
of (he cylinder (the coordinates are moved, not 
the cylinder). 

We do this because otherwise the 
calculations are very expensive. For example, 
we could compute derivatives at two places on 
the edge of one of the circular ends, find 
perpendicular lines (with slopes that are 
negative reciprocals of the tangent lines), find 
the intersection point of these lines (the center 
of the circle), and use the distance formula to 
find the radius of the circle. We could then 
compute the area of the circle, and apply the 
distance formula again to yield the length of 
the side of the cylinder. A final multiplication 
gives the volume. This is a very expensive 
procedure involving 3-dimensional 
calculations. Another computationally 
expensive possibility is performing an 
integration to find the volume. 

Changing the basis gives a nice 
representation of the cylinder. Now, all we 
need to do is read the x-value when y and z are 
both zero to get the radius, and read the 
z-value when x and y are both zero to get the 
length. Just two multiplications are required 
(squaring the radius and multiplying the areas 
by the length). No 3-dimensional computations 
are used. The 3 -dimensional problem has been 
decomposed into two 2-dimensional 
subproblems: finding the area of the circle and 
extending this area through the length of the 
cylinder. Note especially the reduction in the 
perceptual and memory abilities required of 
the problem solver: it need only be able to 
read values at which the surface intersects 


16 



coordinate planes, which are single numbers, 
and need only manipulate two numbers at a 
time. This contrasts with the original 
representation, which requires the problem 
solver to read triples of numbers, and to be 
able to simultaneously store several numbers 
at a time, e.g., the equations of the two lines 
that intersect at the center of the circle. Low 
memory and perceptual (input) cost are key 
computational properties of a good 
representation, and hence are important goals 
for reformulation. 



The subproblems are obtained by 
projecting the cylinder onto the x-axis and 
z-axis, respectively. In the new coordinates, 
good subproblems are obtained by projection. 
In the original coordinates, this is not the 
case; projecting onto any coordinate axis or 
coordinate plane yields a subproblem that is 
not cheaper to solve than the original problem. 

As long as the coordinate change process is 
not too expensive, this will result in a net 
savings, especially if many computations are 
performed on the cylinder. Good subproblems 
are characterized in this case by their 
dimensionality; the lower the dimensionality, 
the better the subproblem. The goal of general 
reformulation is to find a representation that 
facilitates problem solving by permitting 
projection to more tractable subproblems, i.e., 
by permitting creation of good abstractions. 

The 2x2x2 Rubik's Cube 

It is remarkable that we can use this 
approach to reformulation on tasks that appear 
very different. Let us examine the 2x2x2 
Rubik's Cube. The techniques we will use here 
scale up; we are using this small Cube to save 
space in the paper. Let the 8 cubicles (the 
fixed positions) in the 2x2x2 Cube be 
numbered as in the figure (8 is the number of 
the hidden cubicle). 

Number the cubies (the movable, colored 
cubes) similarly, and let the goal be to get 
each cubie in the cubicle with the same 
number. For brevity of presentation, we will 
consider only 180° twists of the cube. 


Let f, r, and t denote 180° clockwise turns 
of the front, right, and top, respectively (cubie 
8 is held fixed; Dorst (1989) shows that this is 
equivalent to factoring by the Euclidean group 
in three dimensions). Note that this cubie 
numbering is just a shorthand for labeling each 
cubie by its unique coloring. This holds true 
for the Cube with only 180° twists, as position 
determines orientation. 


Finding Serial Algorithms for Tasks 
Represented by Groups 


Finite groups can be reformulated utilizing 
group representation theory to find coset 
decompositions. This is illustrated on the 
2x2x2 Cube. We use group representation 
theory to represent f, r, and t as matrices: 


( 


f = 


V 


0 

0 

1 

0 

0 

0 

0 


( 


r « 


V 


1 

0 

0 

0 

0 

0 

0 


0 1 0 0 0 0 

1 0 0 0 0 0 

0 0 0 0 0 0 

0 0 0 0 1 0 

0 0 0 1 0 0 

0 0 1 0 0 0 

0 0 0 0 0 1 ; 

0 0 0 0 0 0 > 

0 1 0 0 0 0 

1 0 0 0 0 0 

0 0 0 0 0 1 

0 0 0 1 0 0 

0 0 0 0 1 0 

0 0 1 0 0 0 , 
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0 1 0 0 0 0 0 

1 0 0 0 0 0 0 

0 0 1 0 0 0 0 

t = 0 0 0 0 1 0 0 

0 0 0 1 0 0 0 

0 0 0 0 0 1 0 

, 0 0 0 0 0 0 1 , 

These matrices are 7-dimensional, 
corresponding to the 7 unsolved cubies. The 
reformulation method consists of finding 
eigenvectors of eigenvalue 1; these are the 
invariants. Any invariant of all the actions is 
irrelevant for the task, and can be removed, by 
first changing the coordinate system so that 
the invariant eigenvectors are axes, and then 
projecting to the noninvariant subspace, 
removing all irrelevant information at once. In 
this case, the eigenvectors are: 

* ° i r 1 1 ° 

r: 1 , 0 forX=l,and —1 forX=-l 

1 0 1 


1 

f: 0 
1 


1 for X = 1, and 

0 


for \=- 1 


1 0 -1 

t: 1 , 0 forX=l,and 1 forX=-l 

. 0 J L 1 J ° 

and the common invariant eigenvector is: 

m 


Note that we have abbreviated the above 
eigenvectors to save space; they are actually 
7-vectors. We then change the basis, yielding 
the new representations for r, f, and t: 


1 0 0 0 0 0 0 

0 1 0 0 0 0 0 

0 0 ] 4 0 0 0 

r= 0 0 4 4 0 0 0 

0 0 0 0 1 0 0 

0 0 0 0 0 0 -1 

, 0 0 0 0 0 -1 0 , 

' 1 0 0 0 0 0 0 ^ 

0 1 0 0 0 0 0 

00 a 4 0 0 0 

f= 0 0-4 f 000 

0 0 0 0 0 0 -1 

0 0 0 0 0 1 0 

, 0 0 0 0 -1 0 0 , 

r 1 0 0 0 0 0 0 ' 

0 1 0 0 0 0 0 

0 0 -1 0 0 0 0 

t= 0 0 0 1 0 0 0 

0 0 0 0 0 -1 0 

0 0 0 0 -1 0 0 
, 0 0 0 0 0 0 1 J 

This procedure computes the irreducible 
invariants of a group. The irreducible factors 
of dimension 1, 1,2, and 3 are found along the 
diagonals of the matrices. Projecting to these 
subspaces yields two interesting subproblems: 


-1 0 


On cubelets 1, 2, and 3, the subgroup 
generated is {i. r. f. t. rt. tr}. 
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0 

- 1 2 3 4 ) 

0 

0 

-1 

f = 

0 

1 

0 

lo 

-1 

0 J 


1-1 

0 

0 J 


The above figure is read right-to-left; solved 
cubicles are shaded, "i" denotes the identity 
(null) action. The average number of moves 
required to solve the Cube in this way is 5.17. 


f 


t = 


V 


0 

-1 

0 


-1 0 ) 
0 0 


On cubelets 4, 5* 6, and 7, the subgroup 
generated is 


{i, r, f, t f rf, rt, fr, ft, tr, tf, rfr, rft, rtr, rtf, 
frt, ftr, ftf, trf, tfr, rfrt, rftr, rftf, rtrf, rtfr}. 


Using each set of matrices as generators, 
we get two subgroups of actions, the second of 
which is a faithful representation of the whole 
group. The first subgroup moves cubies 1,2, 
and 3, while holding 4,5,6, and 7 in position. 
The second subgroup moves cubies 4,5,6, and 
7 while holding 1,2, and 3 in their positions. 
We then repeat this procedure on the first set 
of actions to obtain a full set of prime factors 
of the Rubik group. 


These factors can be assembled in different 
ways to form serial algorithms. There is more 
than one way to decompose this group. This is 
analogous to the different ways of multiplying 
the prime factors of a number. Five serial 
algorithms are obtained in this way. We now 
examine two of them. 

Serial Algorithm 1: 


R» {i,IHr} 0 ami} at) o Ar,0 


1. One of { i,r,f } brings cubelet 3 
into cubicle 3. 

2. One of { i,t } brings cubelet 1 and 
2 into their places. 

3. One of { i, rtft } brings (4,6) and 
(5,7) in the proper planes ("the 
front face looks right*). 

4. One of { i, frtr } finishes the Cube. 



Each step in the decomposition 
corresponds to bringing a feature to its goal 
value. Subsequent steps hold that value 
invariant. In this way, sensory planning is 
decomposed, i.e., the agent need only sense 
part of the Cube at each step. For example, the 
first step solves cubicle 3. Knowing the colors 
of the solved cubicle 8, we know the colors of 
cubicle 3 - it has the same color as the bottom 
of cubicle 8, and two new colors. There is only 
one such cubie, and it must be in one of three 
locations: in its goal position, or in cubicle 1 
or 2. The agent need only look in those 3 
locations to determine what action to take. 
Once cubicle 3 is solved, it need not be sensed 
again. The agent next solves cubicles 1 and 2; 
it need only sense either position to see if the 
proper cubie is there; if so, it does nothing, 
otherwise, it twists the top. Finally, the agent 
uses macros to solve the remaining four 
cubicles, by examining the front face to see if 
it's a uniform color, and then examining the 
top or right face to see it is of uniform color. 

This reduction in the complexity of sensing 
(the input requirements) is one of the salient 
aspects of task decomposition. In large, 
realistic tasks, it is not possible to fully sense 
the world, e.g., in a changing environment one 
part of the world may change while the agent 
is sensing another part. Even when possible, it 
is often too expensive. A good decomposition 
can greatly reduce the sensory expense. This 
gain, however, is at the cost of suboptimality 
of the solution. The above decomposition has 
average cost of 5.17, whereas an optimal 
solution is of average length 2.46. There are 
better decompositions. We now examine the 
best decomposition. 

Serial Algorithm 2: 
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1. One of { i,f,fr,ft } brings cubelet 6 
into cubicle 6. 

2. One of { i,r,rt } brings 3,7 in place 
(bottom layer correct). 

3. One of { i, t} finishes the Cube. 

The average number of moves to solve the 
Cube using this decomposition is 2.73. 

Each decomposition can be thought of as a 
coordinate system whose origin is the goal 
state. For example, the second serial algorithm 
can be thought of as a 3-dimensional 
coordinate system (a,b,c) where a is in 
{i,f,ft,fr}, b is in {i,r,rt), and c is in {i,t} (Leo 
Dorst produced the geometric interpretation of 
this coordinate system): 



The first coordinate brings us to the proper 
hexagon, the second coordinate to the proper 
pair of opposing vertices in the hexagon, and 
the third coordinate to the goal state. 

In this coordinate system, each subproblem 
is obtained by projecting onto that coordinate. 
For example, projecting to the first coordinate 
yields a 4-element state space whose states are 
the hexagons. Reaching the goal state 
(hexagon) in this space is equivalent to the 
subproblem of bringing cubie 3 into its goal 
location. 

From the Rubik's Cube example, we see 
that we can view a representation as a 


coordinate system whose axes are the 
components of the task. Using group 
representation theory, we represent the actions 
as matrices. Changing the basis so that 
invariant eigenvectors are axes eliminates 
irrelevant information, and identifies a good 
task decomposition. We now formalize this 
notion in a general way. 

Coordinate Systems in 
Transformation Monoids 

We are interested in the structure of 
transformation monoids, so a natural first step 
is to examine Green's relations (Lallement, 
1979). Green's relations are defined as 
follows: given any semigroup S, we define the 
following equivalence relations on S: 

a R b iff aS' = bS' 

a L b iff S'a = S'b 

H = R n L 

D = R v L 

a J b iff S'aS' = S'bS' 

where S' denotes the monoid corresponding to 
S with an identity element adjoined. 

Intuitively, we can think of these relations in 
the following way: aRb iff for any plan that 
begins with "a", there exists a plan beginning 
with "b" that yields the same behavior; aLb iff 
for any plan that ends with ”a H , there exists a 
plan ending with "b” that yields the same 
behavior; aHb indicates functional 
equivalence, in the sense that for any plan 
containing an "a” there is a plan containing 
"b" that yields the same behavior; two 
elements in different D-classes are 
functionally dissimilar, in that no plan 
containing either can exhibit the same 
behavior as any plan containing the other. 

Let us examine these relations in a 
representation for the Towers of Hanoi. Let Q 
= { 1,2,3,4,5,6,7,8,9} be the set of states for 
the 2-disk Towers of Hanoi. Let A be the 
semigroup of transformations generated by: 
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( 1 23456789^ 

^ 2 3 1 5 6 4 8 9 7 j 

fl23456789 


"x" moves the small disk right one peg 
(wrapping around from peg 3 to peg 1), and 
”y” moves the large disk one peg to the left 
(wrapping around from peg 1 to peg 3). Then 
A is a semigroup with 31 elements. We name 
this representation TOH1. Each element of A 
is a partial function on the set of states. 
Green’s relations in A are: 


DOPI 

1 *» xx, xxx | D1 



y, yxxy, 
yxxyxxy 

yx, yxxyx, 
yxxyxxyx 

yxx t yxxyxx, 
yxxyxxyxx 

xy, xyxxy, 
xyxxyxxy 

xyx, xyxxyx, 
xyxxyxxyx 

xyxx, xyxxyx i 
xyxxyxxyxx 

xxy, xxyxxy, 
xxyxxyxxy 

xxyx, xxyxxyx, 
xxyxxyxxyx 

xxyxx, xxyxx 
xxyxxyxxyxx 


where the R-classes are horizontal and the 
L-classes are vertical. The D-classes model the 
structure of the task representation, in the 
sense that the n-th D-class is equivalent to the 
n-disk Towers of Hanoi. Adding additional 
disks merely adds additional D-classes. Each 
D-class Dn contains the macros that move all 
of the disks 1 through n. 

Let us examine D2. There are three 
subgroups in this D-class, containing the 
idempotents (an idempotent is an element x 
such that xx = x). The idempotents are in bold 
type. These three H-classes are maximal 
subgroups of A and their generators are the 
macros for moving the large disk. Now we 
define a coordinate system for any semigroup. 

Definition. Let R be an R-class of a 

semigroup, and let H x (A,e A) be the set of 


H-classes contained in R. A coordinate 
system for R is a selection of a particular 
H-class, denoted H, contained in R, and of 
elements q v q' x e S 1 with Xe A, such that 
the mappings x — » xq x and y — > yq’ x are 
bijections from H, to H^ and from H x to H,, 
respectively. A coordinate system for R is 
denoted by [H,; {(q^.q'^): X e A}]. 

This says to choose an H-class in a D-class, 
and find 1-1 mappings to all other H-classes in 
the same R-class. We are justified in calling 
this a coordinate system for a D-class, as any 
two representations of coordinate systems for 
any two R-classes are isomorphic, giving 1-1 
mappings to all the H-classes in the D-class 
(Lallement, 1979, p.46). 

This definition of coordinate system 
provides an intuitive conceptual framework for 
homomorphic reformulation. The groups in the 
decomposition can be viewed as levels of an 
abstraction hierarchy, or subproblems in a 
serialization. Each such decomposition yields 
a coordinate system, which is the index of the 
group in the decomposition, together with the 
indices given by the decomposition of the 
group, as illustrated in Rubik's Cube. Change 
of representation involves generating a 
different decomposition for the given monoid, 
and thus is a change of coordinate systems. 
This fits perfectly with the intuition we 
developed in the cylinder example. 

A good example of such reformulation is 
switching between serial algorithms for 
Rubik's Cube. Such reformulation is performed 
to match the unique characteristics of each 
decomposition to the characteristics of the 
agent and the requirements of the task, e.g., 
serial algorithm 2 has a lower expected search 
cost, whereas serial algorithm 1 never requires 
sensing cubicle 5. 

Each coordinate system generates a Rees 
matrix representation for A, permitting us to 
change basis within a semigroup and find 
serial algorithms in a manner analogous to the 
Rubik's Cube example. The reader is referred 
to Lallement (1979) for details of Rees matrix 
representations. Unfortunately, application of 
this technique to general semigroups can be 
very expensive computationally. Even the 
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decomposition of small semigroups may 
require a large number of groups. The minimal 
number of groups required for a decomposition 
of a given semigroup is called the group 
complexity of that semigroup. It is not known 
whether the group complexity is decidable. 
This makes it very difficult to design good 
algorithms for finding such decompositions. 

Even more seriously, this form of 
representation change is not fully general. 
Homomorphic reformulation techniques 
elucidate the structure of a transformation 
semigroup, and thus possess a serious 
limitation: they can only preserve the structure 
of the semigroup, which limits the components 
they can produce. Such techniques can only 
remove extraneous information to uncover 
existing structure in a given representation. If 
this structure is not appropriate for efficient 
problem-solving, then homomorphic 
reformulation will be of little use. 

For example, in ABSTRIPS (Sacerdoti, 
1974) the relevant predicates must already 
exist in the initial representation, or else 
numbers cannot be assigned to them. Another 
example is provided by Subramanian's work: if 
the theory is stated in such a way that the 
irrelevant information is distributed among the 
statements of the theory, rather than 
concentrated in a subset of the statements, 
then it cannot be dropped without rendering 
the theory incapable of solving the task. TOH2 
is such a representation. 

For these reasons, we utilize the technique 
described in this section only within group 
machines. In the next section, we will show 
how to extend this technique to handle a wider 
class of semigroups • inverse semigroups. 

Related Work 

Reformulating tasks in this way has been 
described in various ways in the literature. 
Sacerdoti (1974), Knoblock et al. (1990), and 
Unruh & Rosenbloom (1989), among others, 
describe this reformulation as building an 
abstraction hierarchy. For example, in 
ABSTRIPS an ordering was imposed on the 
state-description predicates; bringing the 
predicates to their goal values in this order 


was viewed as top-down search in a hierarchy 
of abstract problem descriptions. 

Niizuma & Kitahashi (1985) and Banerji & 
Ernst (1977) describe this reformulation as 
projecting the states. In this view, an 
equivalence relation is imposed on the states, 
and the equivalence classes are the states in 
the quotient space. The only actions retained 
in the new representation are those that move 
between equivalence classes. 

Zimmer (1990) and Benjamin et al. (1990) 
describe this reformulation as decomposing the 
actions. In this approach, the set of sequences 
of actions is decomposed into two sets: those 
that are most relevant (according to some 
criterion) for solving the problem, and those 
that are less relevant. This induces an 
equivalence relation on the set of states, as in 
the previously described approach; a 
difference is that sequences of actions 
(macros) are used, rather than actions. The 
decomposition procedure is then repeated on 
the less relevant actions. 

A similar approach is taken by 
Subramanian (1987), who drops statements 
from a theory if the reduced theory can still 
derive the goal statement; the dropped 
statements are considered irrelevant, in these 
approaches, the state space is reduced by 
removing states that can no longer be reached 
by actions (statements) retained in the 
representation (theory). These approaches 
differ from the state projection approach 
mainly in the order in which states and actions 
are reformulated. In the state projection 
approach, a feature is 'Chosen, inducing an 
equivalence relation that factors the states and 
decomposes the actions. In the action 
decomposition approach, the sequences of 
actions are decomposed according to some 
criterion, e.g., irrelevance (Subramanian) or 
enablement (Benjamin), which induces an 
equivalence relation on the states. 

Korf (1983) and Riddle (1986) describe 
this reformulation as serializing the subgoais. 
Finding a set of serializable subgoals for a 
problem permits solution of the problem hv 
solving each subgoal in order. Korf points ojt 
that this reduces the exponent of the search, 
possibly resulting in a big gain in efficiency. 
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Most of these authors refer to this type of 
reformulation in more than one of the above 
four ways. Also, this is not an exhaustive list 
of work on this type of reformulation. In the 
remainder of this paper, we refer to this type 
of representation change as homomorphic 
reformulation, as in Lowry (1990). 

The General Reformulation Problem 

Homomorphic reformulation changes the 
presentation of a semigroup, thus 
"re-presenting" it. The toughest cases of 
reformulation occur when the necessary 
problem-solving structures do not already 
exist, and involve transforming the semigroup 
into a transformationally equivalent semigroup 
with the desired structures. We call this the 
general reformulation problem. In keeping 
with our intuition that homomorphic 
reformulation is a coordinate change, we call 
non-homomorphic reformulation a 

deformation, because it changes the structure 
of the set of actions. 

We begin our examination of the general 
reformulation problem by describing a 
representation for the Towers of Hanoi that 
lacks good decompositions. We then define 
transformational equivalence, and give an 
algorithm for computing transformational 
equivalence for a useful class of semigroups. 

An Example: TOH2 

In TOH2, the only feature available to the 
agent is what disk is on top of each peg. This 
is a sound and complete theory of the 2-disk 
Towers of Hanoi, just as TOH1 is. The search 
complexity of this representation is exactly the 
same as for TOH1, because the states are the 
same, the same number of actions are 
executable in any state, and the solutions are 
of the same length. Thus, we see the 
insufficiency of logical completeness, 
soundness, and worst-case complexity for 
evaluating representations. 

The actions of TOH2 do not mention the 
disk that is moved, and no abstractions can be 
generated. We cannot find an abstraction 
hierarchy that first solves the large disk, then 


the small disk, because the set of actions 
cannot be partitioned into moves for each disk. 
Certainly the agent can first bring the large 
disk to the goal peg, then the small disk, but 
as was pointed out earlier, there is no structure 
in this representation of the set of actions that 
can be used to find that subgoal ordering, and 
the set of actions has no decomposition. No 
matter how we project these actions, we end 
up with all six of them. Thus, we cannot apply 
the type of reformulation we applied to the 
cylinder or to Rubik's Cube. No 
re-presentation of this transformation monoid 
will help; we need a new monoid of actions. 

This is a different semigroup than in 
representation TOH1, and that its structure 
does not reflect the structure of the task in as 
helpful a manner. Relevant distinctions are not 
made, e.g., between moving the larger disk 
from pegl to peg2 and moving the smaller disk 
between pegl and peg2; irrelevant distinctions 
are made, e.g., between moving a disk from 
pegl to peg2 and moving the same disk from 
peg2 to peg3. 

This semigroup possesses only trivial 
(one-element) subgroups. We must find a way 
of transforming this semigroup to a better one. 

Transformational Equivalence 

Although the representations TOH1 and 
TOH2 are structurally dissimilar, they both 
have the Towers of Hanoi as a model, and thus 
map the states of the Towers of Hanoi in a 
logically equivalent fashion. We state this 
precisely with the following definitions: 

Definition. Given two semigroups SI and S2 
acting on Q1 and Q2, respectively, a 
function f: Q1 — » Q2 is said to be a 
transformational reduction if for all p,q 6 
Ql, if q is reachable from p via SI then 
f(q) is reachable from f(p) via S2. 

Definition. Two semigroups SI and S2 acting 
on Ql and Q2, respectively, are said to be 
transformationally equivalent if there exist 
transformational reductions f: Ql — > Q2 

and g: Q2 -4 Ql. 


The preservation of reachability guarantees 
that any solution in one representation is a 
solution in the other. We will call a 
transformational reduction a t-reduction, and 
transformational equivalence will similarly be 
called a t- equivalence . Semigroup morphisms 
are t-reductions; however, not all t-reductions 
are semigroup morphisms. For example, TOH1 
and TOH2 are t-equivalent, but there are no 
semigroup morphisms between them (there can 
be no function from A1 and A2, or from A2 to 
Al.) Neither is a simulation or abstraction of 
the other. 

Computation of t-reductions can be 

extremely expensive. By restricting 
(specializing) and combining (by disjunction) 
elements of the semigroup A2 of 

representation TOH2, we can transform A2 in 
a general way to obtain any semigroup of 

actions that transforms Q2 in a similar manner; 
however, the number of ways of transforming a 
set of partial functions in this way is 

hyperexponential in the number of elements of 
A2. To make this problem tractable, we 

proceed by investigating one class of 
semigroups at a time. We examine the 

structure of semigroups of that class, and 
construct an algorithm that transforms that 
structure into t-equivalent semigroups of a 
class with superior computational properties. 
In the next section, we will describe such an 
algorithm, which transforms inverse 
semigroups into t-equivalent groups. 

Transforming Inverse Semigroups 
into t>equivalent Groups 

As we have seen, finite group 
representations possess an excellent matrix 
representation theory that permits efficient 
computation of serial algorithms. Finite group 
representations also possess another very 
useful property: all the actions in a 

transformation group are totally defined. The 
absence of partially defined actions means that 
there are no constraints on application of 
actions, and therefore a problem solver need 
not test actions for applicability when 
generating and testing possible actions. In the 
AI literature, this is referred to as "embedding 
the constraints in the generator." Testing 


partial actions is responsible for much of the 
time spent by search algorithms. For example, 
a production system spends much of its time 
attempting to instantiate rules that do not fully 
match. Thus, if a task admits a group 
representation, it is very desirable to find that 
representation. Consider a task that admits an 
inverse semigroup representation. 

Definition. A semigroup S is an inverse 
semigroup if for any element a of S, there 
exists an element b of S such that aba = a. 

Many interesting tasks admit inverse 
semigroup representations, including many AI 
tasks, e.g., the Towers of Hanoi, Rubik's Cube, 
the Missionaries and Cannibals, Fool's Disk, 
the Blocks World, and the 8-Puzzle. Also 
included are many motion and assembly tasks, 
e.g., parking a car. Intuitively, a task admits 
an inverse semigroup representation if it is 
true that whenever any sequence of actions s is 
performed in a state q, there exists a sequence 
of actions w that will return to state q. 

We state the following theorem, but omit 
the lengthy proof to save space. 

Theorem. Any task admits a finite inverse 
semigroup representation iff it admits a 
finite group representation. 

In the next section, we will illustrate the 
procedure for transforming inverse semigroups 
to groups with two algorithms, which form the 
core of the proof of the theorem. 

Algorithms 

The transformation of inverse semigroups 
into groups is illustrated on various 
representations for the Towers of Hanoi. 

Reformulating TOH1 

Consider D2 of TOH1. The primitive 
idempotents are in nontrivial subgroups. The 
reformulation algorithm in this case is: 

• Compute Green's relations. 

• Find the primitive idempotents. 
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• Find the generators of the 

corresponding subgroups. 

• Select a coordinate system 

originating at one of the subgroups. 

• Map each generator and all its 
corresponding generators under the 
coordinate system to one new label. 

The primitive idempotents are shown in 
bold type in Figure 12.The generators of the 
subgroups are xxy, xyx, and yxx. The 
renaming process in this case just relabels 
these three to one new label, forming the 
disjunctive macro: 

Define z = case { 

little disk left of large disk: xxy 

little disk on large disk: xyx 

little disk right of large disk: yxx ) 

This new action is globally applicable, and 
moves the two disks so that their relative 
position is unchanged. The identification of 
"the relative position of the two disks" as the 
discriminating feature is not addressed in this 
paper: it will be addressed in part 2, which 
will deal with the syntactic aspects of 
reformulation. The present paper is concerned 
only with the functions that features must 
compute, not with the formulae for computing 
these functions. 

This construction gives a partial morphism 
from A to a group generated by z, with the 
relation zzz = 1. This partial map is defined 
only on the nine elements contained in the 
three group H-classes. Any such partial map 
can be extended with the identity map on all 
totally defined actions. In TOH1, this means 
mapping the action "x" to itself, giving a 
group G, generated by x and z, with the 
relations x 3 = 1, z 3 = 1, xz = zx. In this case, 
the result is a total morphism on A. 


G 1 

1 

z 

z 2 

1 

1 

z 

z 2 

X 

X 

X z 

X z 2 

X 2 

X 

x 2 z 

X 2 Z 2 


As this group is abelian, the set of actions 
of the Towers of Hanoi then decomposes in 
two ways: 

♦ executing the z macro the 

necessary number of times to solve 
the large disk, then 

♦ executing ”x" the necessary number 
of times to solve the small disk; 

or : 

♦ executing "x" the necessary number 
of times to solve the small disk, 
then 

♦ executing the z macro the 

necessary number of times to solve 
the large disk. 

These decompositions do not lead to 
optimal solutions (they can be improved by 
including both right and left moves for both 
disks); however, they possess the usual 
advantage of task decompositions: they clarify 
and simplify the task, leading to reduced 
sensing and planning time. The partiality of 
the actions in TOH1 is encapsulated within 
macros in this new representation, thereby 
eliminating subgoal interference by moving 
the constraints to the generator. 

Reformulating TOH2 

Consider TOH2. The groups containing the 
primitive idempotents are all trivial. In this 
case, a reformulation algorithm is: 
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• Compute Green’s relations. 

• Find the primitive idempotents. 

• Find a minimal word x^Xj.-.x, for 
one of the primitive idempotents. 

• Map the set of functions 
XjXjXj.-.x.x,, xjx,x 4 ...x,x 1 x 2> etc. to 
one new symbol. 

All the primitive idempotents are mapped 
to the identity function. All primitive 
idempotents can be found by cyclically 
permuting a minimal word for a primitive 
idempotent. Also, this word gives a cycle of Q 
(executing the actions of the word visits each 
state of Q exactly once). We restrict these nine 
functions to single states by multiplying on the 
left by the appropriate primitive idempotents, 
and then map these nine functions to one new 
symbol v, giving a cycle that visits each 
element of Q exactly once, so that each v is a 
counterclockwise arrow around the state graph 
for the 2-disk Towers of Hanoi. 

Notice that three of the elements of the 
original semigroup are not mapped; they are 
not necessary for reachability, but only for 
efficiency. This gives a cyclic group G 2 of 
order 9 generated by v, which decomposes into 
two cyclic groups of order three: 



1 


v 6 

2 






3 

6 

1 

1 

V 

V 



4 

7 

V 

V 

V 

V 

2 

2 

5 

t 

V 

V 

V 

V 


Once again, these actions are totally 
defined, so subgoal interference has been 
eliminated and constraints have been hidden 
by encapsulating them in macros. 

This is isomorphic to the group found in 
the previous example from TOH2, with z = v J , 
and x = v 2 . But this group representation is not 


related to the group representation from TOH1 
by a homomorphism >f transformation monoids 
That this is so is evident from the way the two 
groups map the states. Group G, maps state 1 
into state 3 via action x 2 , but G 2 maps state 1 
into state 4 via action v 2 . This shows that 
non-homomorphic transformation groups can 
exist in the category of representations for a 
task. Although these two groups are 
isomorphic as abstract groups, they possess 
different computational properties when acting 
on the states of the task, e.g., the average path 
length between any two states in G, is shorter 
than in G 2 . The morphisms of transformation 
monoids distinguish properly between these 
two representations, thus illustrating the 
usefulness of the formalism for reasoning 
about representations. 

Summary 

We have described a research program 
pursuing an algebraic approach to reasoning 
about representation change. There are three 
advantages to this approach. First, it ties in to 
an existing theory of semigroups that is 
general and intuitive. We hope that this paper 
has demonstrated the intuitive advantages of 
this approach, particularly in the use of 
coordinate systems to characterize 
reformulation. 

Second, we can use this theory for 
classification. We classify representations by 
the structure of their transformation monoids, 
and classify tasks according to the 
representations they admit. We can also 
classify representation changes. For example, 
we have classified reformulations as 
coordinate transformations if they transform 
the presentation of the transformation monoid, 
and as deformations, if they transform the 
structure of the monoid. 

Third, we can use this theory to construct 
algorithms for representation change. For 
example, we showed how to use group 
representation theory to automatically absi- t 
a group representatr n. and we showed ho 
move constraints om the tester to 
generator for invert semigroups. 
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This paper has dealt with the semantics of 
representation change, as embodied in the 
structure of semigroups of actions. Part 2 will 
deal with the agent-dependent features used to 
encode states and actions, which are embodied 
in strings of symbols over alphabets. 
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Abstract 

The goal of the ARIES Simulation Component 
(ASC) is to uncover behavioral errors by “run- 
ning” a specification at the earnest possible points 
during the specification development process. The 
problems to be overcome are the obvious ones - 
the specification may be large, incomplete, un- 
derconstrained, and/or uncompilable. This paper 
describes how specification reformulation is used 
to mitigate these problems. ASC begins by de- 
composing validation into specific validation ques- 
tions. Next, the specification is reformulated to 
abstract out all those features unrelated to the 
identified validation question thus creating a new 
specialized specification. ASC relies on a precise 
statement of the validation question and a care- 
ful application of transformations so as to pre- 
serve the essential specification semantics in the 
resulting specialized specification. This technique 
is a win if the resulting specialized specification is 
small enough so the user may easily handle any 
remaining obstacles to execution. This paper will 
(1) describe what a validation question is, (2) out- 
line analysis techniques for identifying what con- 
cepts are and are not relevant to a validation ques- 
tion, and (3) identify and apply transformations 
which remove these less relevant concepts while 
preserving those which are relevant. 

Introduction 

Validation at the requirements level is often character- 
ized as validation with respect to the client’s or stake- 
holder’s intent. The goal of specification validation is 
to identify those aspects of the specification which do 
not conform to the client’s intent and then to make ap- 
propriate changes. More practically, this boils down to 
uncovering bugs in the specification and fixing them. 
The goal of this work is to address a specific subclass 
of specification errors that have not previously been 
satisfactorily addressed. In particular, this work ad- 
dresses identifying errors in the dynamic behavior of 
a high-level specification. This work will distinguish 


itself from related works by being able to handle large, 
very-high-level specifications. This is done by making 
exp!>u specific validation questions which focus vali- 
dar activities sufficiently enough so that traditional 
vaii on techniques, like simulation and direct exe- 
cute ire tractable. 

The problem of identifying errors in the specifica- 
tion and the cost of finding these later during the de- 
velopment process is well documented [Boehm, 1981]. 
Among these errors, the most difficult to identify early 
on are those which concern behavior. In general these 
include: (1) inconsistency between specification com- 
ponents, (2) incompleteness with regard to known sce- 
narios, and (3) inconsistency between requirements 
and their realization in the specification. 

The work describe herein will uncover behavioral er- 
rors by “running” a specification at the earliest pos- 
sible points during the specification development pro- 
cess. The problems to be addressed are the obvious 
ones - the specification may be large, incomplete, un- 
derconstrained, and uncompilable. These problems are 
addressed via a four step process. First, the validation 
activity is decomposed into specific validation ques- 
tions. Second, the specification is reformulated to ab- 
stract out all those features unrelated to the identi- 
fied validation question thus creating a new specialized 
specification. Third, the specialized specification is ex- 
ecuted with the purpose of proving or disproving the 
validation question. And finally fourth, since the spe- 
cialized specification was constructed in a disciplined 
manner, one may now infer the result of the validation 
question about the original specification. 

The general feel of the interaction is more like a 
debugging sessions, particularly early in the develop- 
ment. The goal is to get something running quickly 
and easily so as to reveal behaviors implied by the 
specification and make them accessible to end users 
and stake holders for early validation (or more likely, 
early error identification). During a typical validation 
session, the specialized specification and its validation 
question are executed. The simulation system using 
the validation question will guide the execution toward 
satisfaction of the validation question. When this is 
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not possible the simulator will point out how the val- 
idation question has been violated. The stake holder 
and analyst will observe the execution. When a valida- 
tion question is not satisfiable, the analyst will be able 
to explore the behavior space to understand why this 
is the case. Appropriate changes may then be made to 
the specification and the specialized specification con- 
struction process replayed. This is then followed by 
re-execution of the specialized specification. Naturally 
this process may be repeated. 

The abstraction or reformulation process employed 
during specialized specification construction is the 
heart of the ARIES Simulation Component(ASC, pro- 
nounced “ask”). It relies on a precise statement of 
the validation question and a careful application of 
transformations so as to preserve the essential speci- 
fication semantics in the resulting specialized specifi- 
cation. This technique is a win if the resulting spe- 
cialized specification is small enough so that the user 
may easily handle any remaining obstacles to execu- 
tion. This paper will (1) describe what a validation 
question is, (2) outline analysis techniques for iden- 
tifying what concepts are and are not relevant to a 
validation question, and (3) identify and apply trans- 
formations which remove these less relevant concepts 
while preserving those which are relevant. 

The work described in this paper is a component of a 
larger effort called ARIES [Johnson et al, 1991] which 
is concerned with the overall task of requirements ac- 
quisition and specification development and validation. 
Requirements may be stated informally and then grad- 
ually formalized and elaborated. Validation is facili- 
tated via a variety of graphical and textual presenta- 
tions. Elaboration and refinement are supported via 
evolution transformations. Additionally, mechanisms 
for reuse and concept encapsulation have been pro- 
vided. 

The example used throughout this paper is drawn 
from the air traffic control domain, specifically behav- 
iors concerning handoff — passing control of an aircraft 
from one air traffic controller to another. Some of the 
concepts included in the full specification are: control, 
physical location, sensors, tracks, maintaining tracks, 
flight plans, aircraft movement, agents within the air 
traffic control domain, etcetera. 

Validation Questions 

At the beginning of the requirements acquisition pro- 
cess, many requirements are not easily expressed as ab- 
stract, concise, declarative statements of stake-holder 
needs. Rather at this point, requirements are often 
more easily expressed informally as a mix of situations 
and experiences which the stake-holder wishes to have 
handled by the system to be specified. 

Informally, a validation question is any question a 
user or stake-holder may have about the specified sys- 
tem. It could encompass anything he/she believes to 
be pertinent. The goal of a validation question is to 


provide a means to ask these questions. Fundamen- 
tally, validation questions are statements of user’s or 
stake-holder’s requirements. They are stated in a man- 
ner as similar as possible to the way they are manifest 
in the user’s or stake- holder’s real world. And they 
hopefully have little dependence on how these require- 
ments may be realized in the specification. This section 
will show how these goals are attained by allowing the 
stake-holder to express his/her requirements via the 
following constructs: scenarios, to describe partial or- 
derings of states and events using both abstract and 
concrete concepts, and assumptions, to support the 
implicit assumptions common in natural language and 
often used when stake-holders describe their needs. 

Consider the following natural language questions. 
They are the intuitive basis from which we will evolve 
formal validation questions. 

• VQ-1: Does handoff occur before the aircraft moves 

from its current airspace to an adjacent airspace? 

• VQ-2: Once in-route to a particular location, is con- 
trol maintained throughout? 

• VQ-3: Will the system recognize when an aircraft is 

out of conformance with its flight plan? 

Figure 1: Some Informal Validation Questions 

The above questions illustrate several important 
characteristics of validation questions. First, valida- 
tion questions often implicitly rely on scenarios and 
assumptions to provide a narrowed context. Second, 
validation questions often use concrete and/or qualita- 
tive instances to focus on specific, relevant attributes 
of the specified system. And finally third, validation 
questions often embody some expected interaction that 
the analyst is trying to stress. The remainder of this 
section will describe how validation questions are de- 
scribed in terms of scenarios and assumptions. 

Scenarios 

Scenarios are a partial ordering of events and/or states. 
They allow one to describe a complex sequence of activ- 
ities at an arbitrary level of detail without necessarily 
making commitments regarding their causal relation- 
ships. 

This section will define the semantics of a scenario 
with respect to state transition diagrams 1 . The spe- 
cific semantics is determined by the scenario mode. 
Alternatives include comparative, restrictive, or pre- 
scriptive. The mode is selected by the analyst dur- 
ing formulation of the validation question. Each mode 
constrains the behavior space of the specification in a 
progressively more restrictive manner. 


1 Other notations for scenarios are also available and are 
often used. They are isomorphic with STDs. 
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• Comparative has no effect on the behavior space 
of the simulated specification, but acts as a watch 
dog informing the user as to the satisfaction or par- 
tial satisfaction 2 of a scenario during simulation. In 
this case satisfaction of a node determines the cur- 
rent node. The current node is neither necessary nor 
sufficient to advance to the next node. 

• Restrictive means that nondeterminism within the 
behavior space is pruned so that if the scenario may 
be satisfied it will be. Basically, satisfaction of each 
node is necessary but not sufficient to advance to 
a following node. Informally this mode is best de- 
scribed as a procedural invariant. 

• Prescriptive means that the simulator advances 
the scenario from one node to the next irrespective of 
the state of the simulation. More formally, the sim- 
ulator treats the satisfaction of a node as necessary 
and sufficient for advancing to the next node. Op- 
erationally, advancing to a transition node implies 
that the corresponding event is invoked. When the 
event is completed the following state node is made 
true. 

Main validation question The validation question 
labeled as VQ-1 in figure 1 is formalized as the state 
transition diagram shown in figure 2. States within-sl 
and within-s2 are the qualitative values for the aircraft 
being in sectors si and s2 respectively. Transitions 
handoff and alert- controller represent the events of the 
same name and the following states represent comple- 
tion of these same events. Transitions m-1 and m-2 
represent any event that would result in the final state 
within-s2. During simulation, ASC will drive this state 
transition diagram to reflect the state of the simular 
tion. If an illegal transition occurs the analyst will be 
informed. 

The goal here has been to express typical, critical 
situations that the user wants to be sure are handled 
in a specific way. This could have been done in terms 
of concept at any level of abstraction. Typically as 
the specification moves closer to completion, the val- 
idation question may become more complex and may 
be expressed in terms of lower-level concepts. 

Driving scenarios Driving scenarios are prescrip- 
tive scenarios, typically used to model actions outside 
the scope of the current specialized specification. Fig- 
ure 3 shows two driving scenarios that are used within 
this example. The driving scenario commits to manage 
specific concepts. In this case those concepts are track- 
position and the qualitative values derived from track- 
position (e.g., top- of- block- altitude and within- accept- 


2 Satisfaction of & state means the predicate associated 
with the state is true. Satisfaction of a transition means the 
event associated with the transition has been invoked. Sat- 
isfaction of a scenario means that the states and transitions 
of the scenario have been satisfied in the order specified by 
the scenario. 



Figure 2: Validation Question - VQ- Handoff 


handoff-computed-point-distance). When a driving sce- 
nario manages a concept it supersedes all other speci- 
fication concepts which attempt to influence the same 
concept. This information will be used later during 
specification reformulation. 

procedure DS-LEVEL-3{) 

:= »tep#( 

insert within-sl mcl; 

insert top-of- block- altitude scl; 

deiay(); 

insert enter-new-airspsce acl) 

procedure DS-LEVEL-4() 

:=steps( 

insert within-sl acl; 

insert top-of-block-altitude acl; 

delay(); 

insert within-accept-handoff-computed-point-distance acl; 
insert within-s2 acl) 

Figure 3: Driving Scenarios for VQ-Handoff 

The level of abstraction at which driving scenarios 
operate is one of the primary influences on the level 
at which simulation will be done. Instead of driving 
track-position , the analyst could decide to drive sensor- 
reports. This would result in a larger, more detailed 
specialized specification which would include process- 
ing of sensor-reports into tracks . 

Scenarios to constrain nondeterminism Specifi- 
cations axe highly underconstrained, particularly early 
in their development. ASC allows one to execute a 
specification in spite of this by providing various mech- 
anism to constrain the nondeterminism within the con- 
text of a validation question. One of these mechanisms 
is s restrictive scenario which acts as procedural con- 
straint on the behavior space. 

One example of this is the Handoff transition in fig- 
ure 2. Handoff is not actually an event but rather a 
restrictive scenario which is constraining the simula- 
tion to only consider handoffs consisting of an initiate 
and accept phase (see figure 4). This scenario pre- 
cludes handoffs from being canceled or rejected. More 


30 




complex scenarios would deal with this after first per- 
forming validation on this simpler case. 

scenario h and oflf(ac: track) 

:= steps[ 

automatic-init-handoff(ac); 

accept- handof^ac, any controller, any controller)] 

Figure 4: Restrictive Scenario 


Assumptions 

Assumptions allow the analyst to codify what are often 
implicit assumptions made by developers as they build 
rapid prototypes. The advantage of this approach is 
that it documents said assumptions. Once recorded, 
the analyst can now separately validated the assump- 
tions with stake-holders within the context of the cur- 
rent validation question. This way the analyst can be 
sure that assumptions do not trivialize the basic intent 
of the validation question. A later section of this paper 
will show how assumptions are used during specifica- 
tion reformulation. 

invariant FIXED-SET-OF-TRACKS 

for- all (tl:track) element-of(tl , {acl, ac2}) 

Figure 5: Assumption for VQ-ffandoff 

Assumptions are expressed as invariants. Figure 5 
contains one of the assumptions used in the current 
example. Since we are not validating track-processing , 
we can relax some of the constraints on tracks and for 
now constrain the number of tracks in the simulation 
model. Acl and acS are defined as tracks in an un- 
shown initialization scenario. 


Influence Analysis 


The previous section described how a validation ques- 
tion is formalized. This section will show how the 
validation question is used to reformulate the current 
specification into a specialized specification which is 
simulatable. We begin with influence analysis. 

Brooks in [Brooks, 1986] warns that descriptions of 
software that abstract away its complexity often ab- 
stract away its essence. Influence analysis is a means 
of allowing the analyst to see through this complexity 
to distinguish between concepts which are most rele- 
vant to the validation question and those which are not. 
Once identified, ASC provides reformulation transfor- 
mations which remove those concepts which are not 
relevant. 


In general, formally showing whether or not one con- 
cept effects another is undecidable. Even at an infor- 
mal level, causality is a vary hard problem. Influences 
finesse this issue by relying on rules which are easily 
computable and which generate all potential influences 
rather than making claims about actual influences. As 
such, the resulting influence graph should be consid- 
ered a conservative representation of concept influences 
- that is they may indicate influences which are not ac- 
tually possible, but are safe, in that they will not fail 
to indicate the presence of an influence that does exist. 

Once the initial influence graph is generated, more 
knowledge intensive approaches are applied to remove 
many of those potential influences which are not actual 
influences. 

Another problem in extracting the influence graph 
is that influence paths could be through arbitrarily 
many intermediaries and in an incomplete specifica- 
tion would be inherently suspect. Rather than deal 
with this problem, ASC allows the analyst to limit the 
path length it will look at during analysis. Granted 
this has the horizon effect, but this can be minimized, 
(see section Horizon Effect Addressed) 


Influence Definition 

An influence identifies the conduits through which one 
concept effects another during execution. ASC divides 
these conduits up into three classes: information, con- 
trol, and miscellaneous. This section will informally 
characterize each of these classes and then illustrate 
them via an example. ASC has operationally formal- 
ized these concepts based on the specification language 
Reusable Gist [Johnson and Feather, 1991]. 

• Information influences are concerned with the 
flow of information between concepts. Stated an- 
other way, when information changes, how does it 
percolate through the system? Some examples of 
these types of influences are: database updates, as- 
signment statements, and definitional use of data 
declarations (i.e., relations, types, and instances) by 
other data declarations. 

• Control influences are concerned with if and when 
behaviors may occur. Some examples of this class of 
influences are preconditions on an event, invocation 
of an event, invariants, and conditional statements 
(Note, granularity of influences is at the level of dec- 
larations. Thus influences on or by statements are 
reflected as influences on or by the event which con- 
tains the statement). 

• Miscellaneous influences are concerned with in- 
fluences on and by the the validation question. Most 
influences are onto validation questions with driving 
scenarios being the exception. 
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Figure 6 is the Reusable Gist definition of the event 
accept-handoff. Figure 7 shows a paraphrase of this 
event. The resulting primitive influence graph is shown 
in figure 8. 

Demon ACCEPT-HANDOFF(track, 

current-controllerxontroller, 
receiving-controllerxont roller) 
precondition controlled(track, current-controller) and 
h andoff- i n- progrea»( t rack , 

current-controller, 

receiving-controller) 

postcondition controlled(track, receiving-controller) 

*teps( track controlled *— receiving-controller; 
remove h an doff- in- progress (track, 

current-controller, 
receiving-controller); 
track. track-status «— ’normal) 

Figure 6: Reusable Gist definition of the event Accept* 
Handoff 
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ACtEPT-HANDOFF is sr> act ion of t hs systsa. Its participant* < 

• truck, s controller OMBfT-CONTVOlER end • control lor 
«C£IVINCHXNTROLL£K. To pvftro m sccsat-hendo f f , ths 
■ ■piaitlslly doss ths follovlny thrss *t#s, 

1. Ths tuttsa anion* ths controlled of TMOC to 

RECEIVINCHaWWLLER. 

2. Ths systsa dslstss ths feet thst ths MNOFF-IN-MOOCS 
rs let lan sssoclstss TRACK, ClffCNT-CONTiaXER end 
RECEIVING-CONTROLLER. 

3. Ths systsa sssiflns ths trsck-sUtus of TMCK to nerasl. 

There is s prsoonditlon thst TMCX oust bo rslstsd by t hs 
CONTNQLLEB rs 1st ion to OJRAENT-aWTROLifR «d ths HAMOT-IlHWXIESj 
rs let Ion oust sssocisto TMCK, CUMBIT-CONTTUIER end 
RECEIVING-CONTROLLER. There Is e postc ondi tion thst TMCK oust be 
rslstsd by ths CONTROLLER re 1st ion to RECEIVING-CONTROLLER. 


Quit 


Edit 


Figure 7: Paraphrase of the event Accept* Handoff 

The influence graph of figure 8 is most similar to 
de Kleer’s mechanism graph from his work in qualita- 
tive reasoning [de Kleer, 1986; de Kleer and Brown, 
1986]. The mechanism graph shows the causal influ- 
ences between concepts, A vertex contains an infor- 
mation value which represents a specific circuit com- 
ponent attribute (e.g. the voltage or current at a given 
component). Edges represent how a change in a vertex 
value is propagated to adjacent vertices. Edges are de- 
rived from either component models or domain specific 
heuristics. In influence graphs, a vertex represents a 
specification concept declaration (or fragment). Edges 
represent how a concept influences either the behavior 
or value of another concept. 

The influence graph (figure 8) shows most of 
the immediate influences on and by accept-handoff. 
Receiving- controller, current- controller, and track are 
parameters of the event. Enabling-pred- of- accept- 


handoff is a composite node representing the precon- 
dition of the event. Controlled , and , and handoff-xn - 
progress are relation referenced by the event. Edges 
within the graph represent the direction in which in- 
fluences are propagated. Note that how each influ- 
ence effects a given node is not represented. This is 
in fact outside the capability of influence analysis in 
ASC. None the less, this still provides a great deal of 
information to the analyst when creating a specialized 
specification as we will see later. 



Figure 8: Primitive Influence Graph of the event 

accept-handoff 


Automated Graph Abstraction 

Though the graph in figure 8 could be used as is, it 
shows many influences which really do not drive the 
dynamic behavior of the specification. This section will 
describe some of the influence abstraction rules which 
are applied automatically by ASC. Figure 9 shows the 
resulting influence graph. It is this graph, not the pre- 
vious one, which the analyst is first shown after influ- 
ence analysis. 



Figure 9: Influence Graph of the event accept-handoff 

Below are some of the abstractions which are auto- 
matically applied during influence analysis. 
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• Remove influences on seif. 

• Remove influences from concepts in the Predefined 
folder (i.e., commonly used relations, e.g., and). 

• Remove static concepts which have no influences on 
them 3 . 

• Remove variables and parameters which are not ex- 
plicitly influenced. 4 

Interactive Graph Abstraction 

Not all abstractions can be done in an automatic fash- 
ion. Typically, the presence of certain influences indi- 
cate either an error in the specification or an opportu- 
nity to apply an abstraction. The analyst must make 
these decisions. ASC identifies these cases during au- 
tomatic influence abstraction and then posts notifica- 
tions via an agenda mechanism. When the analyst is 
ready, he/she may view the agenda and the alterna- 
tive actions recommended by ASC. Recommendations 
typically include suggested transformations which can 
cause the desired effect in either the evolving special- 
ized specification or the underlying specification. Some 
of these interactive suggestions are: 

• When there are no influences on a type or relation 
declaration, suggest that the concept should be de- 
clared static. 

• When there is an influence on a type or relation dec- 
laration, suggest that the concept should be changed 
to dynamic (e.g., explicit or derived relation). 

• When an influence node has only a single input in- 
formation influence and only a single output infor- 
mation influence, suggest that the intermediate node 
be abstracted out and the input and output nodes 
be modified to be a direct influence. 

For validation question VQ-Handoff of figure 2, in- 
fluence analysis results in 224 influence nodes. After 
automatic abstraction this count is reduced to 97 influ- 
ence nodes with 51 posted suggestions (most of which 
concern suggestions on declaring concepts as dynamic 
or static). After the analyst handles the most obvious 
suggestions, the influence node count is reduced to 77. 
Though an improvement over the starting point of 224 
concepts, there are still a lot of concepts to compile for 
simulation. 

Specification Reformulation 

The previous section’s analysis and reformulation were 
basically independent of the validation question. This 
section will suggest more drastic reformulations which 

3 If there is an influence on a static concept post it as an 
error. 

4 Explicit information influence are various forms of as- 

signment. Event parameters are almost always removed 
since the dominating influence is the control influence on 
the event (e.g., who the caller is). 


take advantage of the knowledge implicit in the vali- 
dation question. 

Modification, whether motivated by errors discov- 
ered in the specification or by simplifying assumptions, 
are accomplished via the application of transforma- 
tions. Since ASC is a component of ARIES, it is able 
to take advantage of an extensive library of evolution 
transformations [Johnson and Feather, 1990]. These 
transformations formally evolve a specification based 
on specific desired effects. 

An important feature of the reformulation process is 
that not all the effort to build the specialized specifica- 
tion need be thrown away after doing validation. Many 
of the applied transformations are equally valid in both 
the specialized specification and the original specifica- 
tion. ASC allows the analyst to declare during refor- 
mulation if a transformation should be recorded and 
later applied to the original specification. This selec- 
tive record of transformations provides an opportunity 
for these transformations to be replayed on the original 
specification. (This selective replay capability has not 
yet been implemented in ASC.) 

Reformulations Motivated by the 
Validation Question 

Reformulation based on a validation question is anal- 
ogous to how partial evaluation (mixed computation) 
[Ershov, 1985j is able to generate an efficient resid- 
ual program based on a more general program and a 
subset of its input parameters. This technique is po- 
tentially more powerful because a validation question 
is a richer source of knowledge than just a list of input 
parameters. 

Reformulation based on assumptions We begin 
with VQ-Handoff s assumption (see figure 5) that there 
will be a fixed number of tracks which already ex- 
ist. Figure 10 shows the influence graph of fixed-set - 
of-tracks and all of the concepts in the specification 
it directly influences, i.e., track, initiate-tracking , and 
extract-track-info. 

The analyst begins with the event initiate-tracking. 
Influence analysis shows there are no other influences 
on it. Visual inspection reveals to the analyst that it 
creates tracks. Since the assumption says there will 
be no new tracks, this event is superfluous and thus 
should be abstracted out. 5 

The analyst next handles the type track. The impli- 
cation of the assumption is obvious. The type should 
be declared static. Note that this is not generally true 
within the air traffic control domain, but illustrates a 
common result of validation question assumptions. 

The third influenced node is the event extract-track- 
info. The analyst attempts to handle it as he/she 


3 A theorem prover would be very useful here. Given the 
narrowed context, it might be tractable to prove the above 
conclusion automatically. This is outside the scope of ASC. 
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Figure 10: Influence Graph of the assumption fixed - 
sei-of-tracks 


handled initiate-tracking. In this case, visual inspec- 
tion of the event shows that this event both creates 
tracks and assigns them a track-position. At this point 
the analyst needs to determine if he/she will special- 
ize extract- track-info into a new event which only deals 
with track-position or if the event may be abstracted 
away completely. The analyst then displays a new in- 
fluence graph as shown in figure 11. This influence 
graph shows that extract-track-info influences both 
track and track-position. Additionally, track- position 
is influenced by the driving scenario ds-levtl-S. At this 
point, remembering that driving scenarios have taken 
responsibility to be the sole maintainer of the concepts 
they influence, the analyst now removes extract-track- 
info from the specialized specification. 



Figure 11: Influence Graph of the event extract-track- 
info 

Reformulation based on driving Scenarios In 
the current example, influence analysis shows that the 
next- controller relation used by automatic-init-handoff 
relies on paired and flight-plan neither of which is fully 
defined. Additional analysis shows that flight-plan in- 
fluences only a few other concepts including confor- 


mance. The analyst decides to abstract out flight-plan 
and paired and then model next- controller and confor- 
mance without them. 

Two approaches are possible. One is to redefine 
next-controller such that it can be derived from con- 
cepts already within the specialize specification or to 
directly maintain the relation. In the spirit of pick- 
ing the approach which is quick (and hopefully not too 
dirty), the latter is chosen. This is easy to do within 
the context of the validation question. The analyst 
simply includes an assertion as part of the driving sce- 
nario that next-controller(ac-l f cl). 

More reformulation based on assumptions 
Conformance is handled slightly differently. Since VQ - 
Handoff deals with handoffs which are initiated auto- 
matically, the analyst can take advantage of this spe- 
cialized case knowledge and assume conformance is al- 
ways true. To assume otherwise would imply one is 
not in an automatic handoff situation and thus would 
be outside the scope of VQ- Handoff. 

Note that it might be tempting to abstract away 
conformance by assuming inhibited- handoff \s false, but 
this would fail because inhibited-handoff also influences 
other events which are directly involved in the valida- 
tion question (see figure 12). Such assumptions which 
influence system behavior with respect to the valida- 
tion question are not acceptable. 
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Figure 12: Influence Graph of the relation Inhibited- 
Handoff 

Another use for assumptions is to decompose vali- 
dation questions into smaller more manageable pieces 
similar to how assumptions are used in proofs to break 
them up into multiple, hopefully more manageable, 
pieces. In the case of automatic verses manual initi- 
ation of handoff, a well chosen assumption creates two 
distinct scenarios which are then handled separately. 

Reformulation to introduce qualitative abstrac- 
tions An earlier version of the validation question 
VQ-Handoff was expressed in terms of irack-postiion. 
Likewise, many event preconditions were expressed in 
terms of track-position. Rather than describe scenar- 
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ios in terms of track-positions , the analyst decided to 
reformulate the specification to introduce qualitative 
abstractions. ASC facilitated this by showing what 
concepts where influenced by track-positions. The an- 
alyst was then able to use ARIES transformations to 
replace complex predicates about track-position with 
new predicates expressed in terms of newly defined re- 
lations. These new relations were top- of- block- altitude, 
within-handoff-computcd-point , within- accept- handoff- 
computed-point-distance , and within-accept-handoff- 
computed-point-time . Each is a specializations of track- 
position. This now allowed the analyst to easily de- 
scribe validation questions and scenarios in terms of 
these qualitative states rather than in terms of track- 
position which has many more states but which fall 
into one of these five qualitative states. 

Reformulating scenarios as run-time con- 
straints Not all optimizations are realizable during 
specialized specification construction. This problem is 
pointed out by Meyer in [Meyer, 1991] when applying 
partial evaluation to imperative languages. The prob- 
lem is that compile time execution can result in side- 
effects which are not noticed at the appropriate time. 
This is because the side-effect could happen during spe- 
cialized specification construction and not during sim- 
ulation. The problem with this is that other parts of 
the specification which trigger on the side-effects of the 
partially evaluated scenario will now not have those 
side-effects to react to at run-time. As a result par- 
tial evaluation at specialized specification construction 
time must be constrained not do anything that causes 
triggering states to disappear. 

In VQ-Handoff \ the handoff scenario includes only 
automatic-init-handoff and accept- handoff. It excludes 
several other events that could be a part of a typi- 
cal handoff scenario (e.g., manual-intt- handoff \ reject - 
handoff, and cancel-handoff). ASC translates these sce- 
narios into a procedural invariant which ensures the 
appropriate behavior at run-time. The advantage of 
such a constraint is that one is not forced to deal with 
control issues regard the full set of events until the pair- 
wise (in this case automatic-init-handoff and accept - 
handoff) interaction has first been resolved. 

Horizon Effect Addressed 

ASC mitigates the horizon effect by trying to create a 
specialized specification which defines a closed simula- 
tion model. In such a model there are no outside influ- 
ences and all influence paths within the closed model 
are known, thus there is no horizon effect. 

With respect to the current example, reformulation 
continues until a closed model is achieved. The final 
specialized specification contains 44 influence nodes. 

Summary 

The success of the ARIES Simulation Component may 
be measured with respect to the following criteria. 


• ability to execute a specification where previously it 
could not be done. 

• ability to execute a specification with less effort than 
was required before. 

• ability to document requirements satisfaction. 

• ability to make validation comprehensible to stake- 
holders. 

• ability to provide a flexible approach for system val- 
idation. 

At this point its too early to address most of these 
issues. I have only applied ASC to a few validation 
questions, all within the domain of ATC handoffs. The 
most quantifiable results to date concern the number of 
concepts involved in the specification of VQ-Handoff , 
figure 13. 



number of concepts 

ATC Knowledge Base 

1,400 

Primitive Influence Graph 

224 

Initial Influence Graph 

97 

Final Specialized Specification 

44 


Figure 13: Number of Concept Declarations for Vali- 
dation Question vq-handoff 

This illustrates that only a fraction of the total 
number of possible concepts were actually needed to 
achieve executability. Additionally, the focus provided 
by the validation question provided direction on which 
concepts to flesh out next in order to achieve closure 
and executability. Granted these techniques could and 
have to some degree been applied by hand when one 
builds a rapid prototype. The difference is that ASC 
generates both a rapid prototype and a formal charac- 
terization of how it relates to the original specification. 

As a sidebar, in the process of constructing the spe- 
cialized specification I discovered several errors. Many 
of these were in fact errors in the specification which 
I believed was essentially correct. This seems good, 
in that error discovery is an important precursor to 
validation. 
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Introduction 

Many important applications can be formalized as 
constrained optimization tasks. For example, we are 
studying the engineering domain of two-dimensional 
(2-D) structural design. In this task, the goal is to de- 
sign a structure of minimum weight that bears a set of 
loads. 

Figure 1 shows a solution to a design problem in 
which there is a single load (L) and two stationary sup- 
port points (SI and S2). The solution consists of four 
members, El, E2, E3, and E4 that connect the load 
to the support points. In principle, optimal solutions 
to problems of this kind can be found by numerical 
optimization techniques. However, in practice [Van- 
derplaats, 1984] these methods are slow and they can 
produce different local solutions whose quality (ratio to 
the global optimum) varies with the choice of starting 
points. Hence, their applicability to real-world prob- 
lems is severely restricted. 

To overcome these limitations, we propose to aug- 
ment numerical optimization by first performing a 
symbolic compilation stage to produce (a) objective 
functions that are faster to evaluate and that depend 
less on the choice of the starting point and (b) selection 
rules that associate problem instances to a set of rec- 
ommended solutions. These goals are accomplished by 
successive specializations of the problem class and of 
the associated objective functions. In the end, this pro- 
cess reduces the problem to a collection of independent 
functions that are fast to evaluate, that can be differen- 
tiated symbolically, and that represent smaller regions 
of the overall search space. However, the specialization 
process can produce a large number of sub-problems. 
This is overcome by deriving inductively selection rules 
which associate problems to small sets of specialized 
independent sub-problems. Each set of candidate so- 
lutions is chosen to minimize a cost function which 
expresses the tradeoff between the quality of the solu- 
tion that can be obtained from the sub-problem and 
the time it takes to produce it. The overall solution 
to the problem, is then obtained by solving in parallel 
each of the sub-problems in the set and computing the 
one with the minimum cost. 


In addition to speeding up the optimization process, 
our use of learning methods also relieves the expert 
from the burden of identifying rules that exactly pin- 
point optimal candidate sub-problems. In real engi- 
neering tasks it is usually too costly to the engineers 
to derive such rules. Therefore, this paper also con- 
tributes to a further step towards the solution of the 
knowledge acquisition bottleneck [Feigenbaum, 1977] 
which has somewhat impaired the construction of rule- 
based expert systems. 



Figure 1: A solution to a 2-D structural design problem 
with given topology. 

Our optimization schema differs from techniques 
currently used in the machine learning community. 
Our approach relies on the specialization of the prob- 
lem via incorporation of constraints prior to optimiza- 
tion. Braudaway [Braudaway, 1988] designed a sys- 
tem along the same principle. However, to our knowl- 
edge, very little work has been done in using learning 
techniques to speedup numerical optimization tasks. 
In contrast, the current trend in the machine learning 
community focuses on methods, such as Explanation 
Based Learning (EBL) [Ellman, 1989], capable of gen- 
erating rules. In addition, EBL methods have had little 
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success in the task of optimizing numerical procedures. 
We conjecture that one of the reasons is the depen- 
dence of EBL methods on the trace of the problem 
solver. The trace of a numerical optimizer gives little 
information on the structure of the problem. There- 
fore, in mathematical domains, EBL-derived rules are 
too detailed to produce any appreciable speedup. 

The remainder of the paper is organized as follows. 
Section presents the 2-D structural design task. This 
is followed in Section by an overview of numerical op- 
timization methods, their limitations, and our solution 
which is illustrated using a simple example. The ma- 
chine learning methods are outlined in Section . These 
methods are then applied in Section which illustrates 
the experiments. These show that, for a certain family 
of problems, the compilation stage produces a substan- 
tial improvement in the performance of the optimiza- 
tion methods. Benefits and limitations of our strategy 
are summarized in Section , which also outlines future 
work. 

Task description 

Table 1 describes the 2-dimensional structural design 
task that we are attacking. Figure 1 shows an exam- 
ple problem in which L is the load and SI and S2 are 
two supports. The so-called “topology” is given as a 
graph structure containing four edges (the members) 
and four vertices (the load, the two supports, and an 
intermediate connection point C). The topology does 
not specify the lengths of the members or the location 
of C. The topology and the position shown in the figure 

Table 1: The 2-D Design Task. 

Given: A 2-dimensional region R 

A set of stable points (supports) 

A set of external loads with application 
points within R 

Find: The number of members, connectivity, and 

positions of all intermediate connection 
points such that the structure has minimum 
weight and is stable with respect to all exter- 
nal loads. 


give the minimum-weight solution. In this solution, 4 
members are used and El and E3 are in tension (they 
are being “stretched”), while members E2 and E4 are 
in compression. Tension members will be referred to as 
“rods” and indicated by thin lines. Compression mem- 
bers will be referred to as “columns” and indicated by 
thick lines. The type of members used in the solution 
is an abstraction that we have used throughout our 
work. To indicate a configuration of tensile and com- 
pressive members that constitutes a solution, we have 
defined the stress state . The stress state is an array 
of m elements in which each element corresponds to 
a member. The value of each element in the array is 


+1 if the member is tensile and —1 if the member is 
compressive. 

The weight of a truss can be decreased in at least 
two ways. First, the engineer can use lighter material. 
Second, the “shape” can be designed in such a way 
that, for instance, it uses less material and, hence, it is 
lighter. In this paper we do not consider the (admit- 
tedly) important advances in the science of material 
but, instead, we focus on the synthesis of shapes that 
reduce the weight of a truss with a chosen construction 
material. 

The task shown in Table 1 is actually only one step 
in the larger problem of designing good structures. 
In general, structural design proceeds in three steps 
[Palmer and Sheppard, 1970; Vanderplaats, 1984]. 
First, the problem solver chooses the topology, which 
specifies the locations of the loads and supports and the 
connectivity of the members. Then, the second step 
is to determine the locations of the connection points 
(and hence the lengths, locations, internal forces, and 
cross-sectional areas of the members) so as to mini- 
mize the weight of the structure. This is usually ac- 
complished by numerical non-linear optimization tech- 
niques. The third and final step in the process opti- 
mizes the shapes of the individual members. This can 
often be accomplished by linear programming. 

In addition to focusing only on the first two steps, 
we have introduced several simplifying assumptions to 
provide a tractable testbed for developing and test- 
ing machine learning methods. Specifically, we as- 
sume that structural members are joined by frictionless 
pins, only statically determinate structures are consid- 
ered, the cross section of a column is square, columns 
and rods of any length and cross sectional area are 
available, and supports have no freedom of movement. 
A statically determinate structure contains no redun- 
dant members, and hence, tr.e geometrical layout com- 
pletely determines the force- acting in each member. 

Given these assumptions, he weight of a candidate 
solution is usually calculate*! by a three-step process. 
The first step is to apply the method of joints [Wang 
and Salmon, 1984] to determine the forces operating in 
each member. Once this is known, the second step is to 
classify each member as compressive or tensile. This is 
important, because compressive and tensile members 
are composed of different materials and have different 
densities; e.g. concrete columns and high tensile steel 
rods. The third step is to determine the cross-sectional 
area of each member. The load that a member can 
bear is assumed to be linearly proportional to its cross- 
sectional area. Finally, the weight of each member can 
be computed as the product of the density of the ap- 
propriate material, the length of the member, and the 
cross-sectional area of the member. 

The last two steps can be collapsed into a single 
parameter k: the ratio of the density per-unit-of-force- 
bome for compressive members to density per-unit-of- 
force-borne for tensile members. With this simplifica- 
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tion, instead of minimizing the actual weight, we can 
minimize the following quantity which, with an abuse 
of notation, we define as 

Weights ||jF’ i ||/<+ £ k\\Fj\\lt. 

tensile compressive 

members members 

Fi is the force in member i, and /, is the length of 
member i. This is the initial objective function for the 
work described in this paper. 

We conclude this section with a brief description of 
the method of joints, which is one of the methods used 
to calculate the Fi in statically determinate structures. 
The method of joints computes these forces by solv- 
ing a system of linear equations as illustrated, for the 
problem in Figure 1, in Table 2. The matrix of coeffi- 
cients is called [Wang and Salmon, 1984] the axial (or 
sfaftc) matrix and the vector of givens is defined as the 
load vector . In Figure 1, let C = (x,y), SI = (xi,yi), 
and S2 = (x2, 3/2)1 be the cartesian coordinates of the 
connection point, and the two supports, respectively. 
In addition, let (x/,y/) be the coordinates, and let p 
and 7 be the magnitude and direction of the load L. 
The internal forces in each member are obtained by 
first constructing the axial matrix and load vector and 
then solving the system of equations for the unknown 
internal forces. Table 2 shows the symbolic system of 
equations for the example in Figure 1 with unknown 
forces Fi $ F2 , F3 , and F4 and with the coordinates of 
all the points explicitly substituted. 

Now that we have defined the 2-dimensional design 
task and formulated it as a non-linear optimization 
problem, let us turn, in the next section, to a brief 
review of existing techniques for optimization and to 
the proposed methods. 

Knowledge-based Optimization 

Classical optimization textbooks [Vanderplaats, 1984; 
Papalambros and Wilde, 1988] present a comprehen- 
sive survey of optimization methods and of various 
techniques for conducting the search for an optimal 
solution. The schema illustrated in Figure 2 is typical 
of many domain independent non-linear optimization 
methods. The process is iterative. Starting at some 
initial point, the objective function is evaluated and 
the termination criteria are tested. If the test fails, 
a new point is generated by taking a step, of some 
chosen length in some chosen direction, away from the 
current point. Each point defines a set of values for 
the independent variables in the objective function. 

Most optimization algorithms differ primarily in the 
criteria used to choose the direction along which to 
optimize. Some optimization methods (e.g., PowelPs 
method [Vanderplaats, 1984]) choose the direction and 
step size using only evaluations of the objective func- 
tion. Other methods, such as gradient descent and 
its variations [Papalambros and Wilde, 1988], require 
computation of the partial derivatives of the objective 


Table 2: Method of Joints for the example in Figure 1. 
The product of the axial matrix and of the unknown 
forces Fj equals the load vector. 
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function to choose the new direction of optimization. 
Still other methods approximate the partial deriva- 
tives numerically by evaluating the objective function 
at many points. 

The primary computational expense of numerical 
optimization methods is the repeated evaluation of 
the objective function. An advantage of gradient de- 
scent methods is that they need to evaluate the objec- 
tive function less often, because they are able to take 
larger, and more effective steps. Of course, they incur 
the additional cost of repeatedly evaluating the par- 
tial derivatives of the objective function. Hence, they 
produce substantial savings only when the reduction 
in the number of function evaluations offsets the cost 
of evaluating the derivatives. 

In engineering design, the objective function is typ- 
ically very expensive to evaluate. This slows the nu- 
merical optimization process because the speed of nu- 
merical optimization is determined by the cost and 
frequency of evaluating the objective function. For 
the structural design domain to compute the objective 
function (volume of each structure) a system of lin- 
ear equations must be solved. This is typically carried 
out by algorithms which are cubic in the number of 
unknowns. This number is usually large in real appli- 
cations like bridge design. Furthermore, the fact that 
the constant k is applied only to compressive members 
makes it impossible to obtain a differentiable closed- 
form. The signs of the internal forces must be com- 
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Figure 2: TYaditional optimization schema. 

puted before it is possible to determine which members 
are compressive. This prevents the use of gradient- 
based optimization methods that require fewer evalu- 
ations of the objective function - only slower function- 
based methods are applicable. One measure of the 
performance of a numerical optimizer is the time it 
takes to produce a solution. This quantity, however, 
depends on the choice of the starting point. Therefore, 
to obtain an accurate measurement, it is necessary to 
average the values obtained running the optimizer from 
different starting points. 

Moreover, most engineering models are not uni- 
modal. This directly affects the reliability of the solu- 
tions because numerical optimizers settle for local min- 
ima since they are unable to leap from one region to 
another to determine the global minimum. As shown 
in Figure 3, the objective function for the structural 
design domain is non unimodal. For instance, for the 
function in Figure 1 gradient methods started with 
x = 1500 and y = 2000 reach a local minimum in 
region R2 while the global minumum is in region Ri. 
A measurement of the reliability can be obtained by 
taking the ratio (quality) of the local minimum and of 
the global minimum in controlled experiments in which 
the absolute minimum can be easily computed. Time 
and quality induce a tradeoff that can be exploited by 
defining the function: 

utility(solution) = CPUti*s( solution)* 

CPUcost + quality(solution) 

where CPUcost is a positive constant that accounts for 
the cost of running the optimizer. We have used this 
definition in the learning stages of our approach to 
focus the attention of the optimization process on a 
few candidates that will produce solutions of maximum 
utility. 

As shown in Figure 4, the increased reliability and 
speed are accomplished by augmenting the traditional 
run time optimization with a “compilation” stage prior 
to numerical optimization. The inputs to the compiler 
are (a) an high level description of the problem, (b) 



Figure 3: Volume of the structure in Figure 1. 

domain knowledge about stress states, and (c) a pro- 
cedure to generate training examples. Symbolic and 
inductive techniques are then used to (1) produce sim- 
plified versions of the objective function per each stress 
state, and (2) learn stress state selection rules which 
map problem instances into sets of candidate stress 
states of minimum cost. 

First, the compiler produces one objective function 
for each topology and stress state. Each of these func- 
tions is a specialized version of the expression of the 
weight and it is faster to evaluate than the original, 
less specific, objective function. As an example, the 
function produced for the topology and stress state in 
Figure 1 is illustrated in Table 3. This expression is a 
closed form of the weight of a structure as a function 
of the two cartesian coordinates of connection point C 
restricted to region Rl in Figure 3. Moreover, these 
simplified expressions are differentiable and this per- 
mits the use of faster gradient-based optimization al- 
gorithms. 

Another obstacle to practical applications of numer- 
ical optimization methods is the high dimensionality 
(number of independent variables) of the problems. 
Our compilation strategy decreases the dimensionality 
of optimization problems by searching a set of train- 
ing examples for relations (regularities) among inde- 
pendent variables. These relations are then used as 
constraints among variables and are incorporated into 
the specialized versions of the objective function. This 
procedure eliminates independent variables with the 
result of greatly simplifying the optimization process, 
of enlarging its scope of applicability, and of speed- 
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Figure 4: Proposed numerical optimization framework. 


ing up run time optimization. For the region R1 in 
Figure 3, the compiler will determine that if the con- 
nection is expressed in polar coordinates p and a only 
the distance p from support SI need be determined (see 
Figure 1.) This is because, in the analysis of the exam- 
ples, it will discover that the angle a can be computed 
as one half of the angle /? which is one of the givens 
of the problem. The final objective function is shown 
in Table 4 which contains only a single variable p vs. 
the two (x and y) in the expression in Table 3. This 
final expression indicates a reduction in dimensional- 
ity because, at run time, the numerical optimizer will 
only need to determine the value of p to compute the 
position of the connection point. 

Finally, the compiler learns search control knowledge 
in the form of 

IF-THEI-ELSE rules. This is then used at run time to 
select stress states that lead quickly to quasi-optimal 


Table 3: Partially evaluated objective function for the 


pro 


dem of Figure 1. 

Weight = 

(1.14 10 13 x - 5.66 10 9 x 2 + 8.16 10 5 x 3 + 
3.28 10 13 y - 3.26 10 9 xy + 2.44 10 5 x 2 y- 
6.70 10 9 y 2 + 8.16 10 5 xy 2 + 2.44 lO'y 3 - 
4.08 10 16 ) / 

(1.28 10‘xy - 2.56 10 4 x + 2.56 10 4 y- 
6.40 y 2 - 2.56 10 7 ) 


solutions. The set of stress states is chosen so that the 
utility of the stress states is maximized. The utility 
is a function that combines the time it takes to pro- 
duce a solution with its expected quality (ratio to the 
global minimum.) This function introduces a trade- 
off between quality and time that is exploited by the 
learning algorithm (Cerbone and Dietterich, 1992]. As 
an example, for the design problem in Figure 1 whose 
objective function is shown in Figure 3, the compiler 
derives search control knowledge that allows the prob- 
lem solver to focus the attention of the numerical op- 
timizer on regions R1 and R2 when the load is directed 
toward support S2 and away from support SI. 

Machine Learning Methods 

This section describes in greater detail the symbolic 
and inductive learning techniques. Inductive learning 
techniques are used to (a) simplify the optimization 
process by reducing the number of independent vari- 
ables and (b) derive the stress state selection rules. 
The inductive methods rely upon knowledge about the 
partitioning of the design space and upon a set of train- 
ing examples that, for many engineering tasks, can be 
generated by the compiler. A complete discussion of 
the compilaton stages can be found in [Cerbone, 1992]. 
Symbolic Methods. Symbolic techniques are used 
to incorporate into the objective function knowledge 
about stress states and knowledge discovered during 
inductive analysis. The goal is to produce an highly 
simplified and specialized objective function. This is 
accomplished by partial evaluation [Futamura, 1971], 
and loop unrolling [Burstall and Darlington, 1977] - 
two techniques widely used in high-end optimizing 
compilers. Partial evaluation incorporates constant 
values for variables into functions (or programs) and 
simplifies them. Loop unrolling unfolds iterative con- 


Table 4: Objective function for the structure in Fig- 
ure 1 with reduced dimensionality. 
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structs (e.g., for loops) and transforms them into se- 
quential programs. These techniques have been im- 
plemented using the Mathematica programming lan- 
guage [Wolfram, 1988] and [Maeder, 1989] which is 
suitable to numerical problems. 

As an example of specialization, we illustrate how 
domain knowledge is used to specialize the objective 
function. First the problem solver chooses the topol- 
ogy. This can be simply done by enumerating a few 
possible configurations. Once the topology is chosen, 
it can be incorporated into the objective function. This 
allows us to compute symbolically the axial matrix and 
the load vector (see Section ). We then apply sym- 
bolic algorithms to solve and simplify the system of 
equations and to obtain a closed-form expression for 
the forces. In principle, an infinite number of topolo- 
gies should be explored; however, Friedland [Friedland, 
1971] experimentally demonstrated that only a few of 
them need be considered to achieve satisfactory Elu- 
tions. 

The second specialization step is to plug in the givens 
of the problem and partially evaluate the resulting 
mixed symbolic/numeric expression. For our exam- 
ples, the givens of the problems are the loads and sup- 
ports; however, one may wish to analyze a structure 
subject to different inputs such as various loading con- 
ditions or support locations. In such cases it is possible 
to leave those values in symbolic form and substitute 
their numerical values at run time. 

The third compilation step is to split the objective 
function V into cases according to stress state. When 
the objective function is specialized according to stress 
state, the result is a collection of special-case objective 
functions {Vi,...,V„}. Because each Vj corresponds 
to one stress state, it is possible to tell, at compile 
time, which forces should be multiplied by k. Hence, 
each Vj is differentiable, and this enables us to employ 
gradient-based optimization techniques that, typically, 
are faster than methods based only on evaluating the 
objective function alone. 

Reduction of independent variables* A further 
speedup and increase in reliability of the numerical 
optimizers is obtained using inductive methods to de- 
crease the number of independent variables (dimen- 
sionality) in the numerical optimization problem. The 
compiler is given a series of examples and uses them 
to inductively determine which independent variables 
can be computed as functions of known quantities. For 
instance, in the design domain, when searching within 
a region it might turn out to be superfluous to search 
along all dimensions because there might exist a sim- 
ple relationship between one of the coordinates and 
known quantities like the location of loads and sup- 
ports. These relations are then used as constraints 
and are incorporated into the objective functions. The 
result is the reduction of the number of independent 
variables. This, in turn, produces an even simpler and 
faster optimization problem. For instance, the func- 


tion shown in Table 3 has two independent variables 
while the corresponding inductively simplified version 
has only one independent variable and it is shown in 
Table 4. Hence, the final optimization problem entails 
a simple linear optimization while the original one has 
two dimensions. 

The variables to be eliminated are determined using 
an EBL-like approach which employs: 

• training examples 

• a library of given geometry entities (points, angles, 
etc.) 

• a geometrical domain theory 

• known relationships among geometric entities 

• regularities - a mixture of heuristics and statistical 
regression techniques. 

Each unknown connection point is subject to a compile 
time heuristic search process that attempts to compute 
(reformulate) the location as a function of loads and 
supports. 

To see how this works, let us consider again the ex- 
ample problem in Figure 1 which we shall refer to as 
the “bisector” example. In this example, the connec- 
tion point C is the unknown and the givens are the 
load L and the supports SI and S2. Moreover, let us 
assume that a set of training examples has been either 
provided or derived by the system. The reformulation 
starts by identifying all geometric objects using the 
given domain theory. For the bisector example, the 
system identifies, among others, the following geomet- 
ric objects: 

point (SI) , point (S2), point(C), point(L), 
angle (/?, L, SI, S2) , angl«(a, C, SI, S2), 
segnsnt(SGl, SI, S2) , ... 

Predicates such as point and angle are basic el- 
ements of the given geometric domain theory. This 
means that, given a set of cartesian coordinates, the 
system capable of computing each predicate. Dur- 
ing the imputation of each predicate, the system tags 
it as <j**;*n or unknown . A predicate is given if all 
the entities used to compute it are either givens of 
the problem (loads or supports) or can be expressed 
a combination of given predicates. Otherwise, the 
predicate is tagged as unknown . For the bisector ex- 
ample, point (C) and all predicates that involve it in 
their derivation (e.g. angle(a, C, SI, S2)) are un- 
knowns, all others are givens. 

With this knowledge, the system then tries to relate 
the unknown geometric entity point (C) to as many 
other entities as possible with the ultimate goal of ex- 
pressing it only using given geometric entities. This 
is accomplished by using a blend of EBL and dis- 
covery techniques. In the EBL jargon, the geomet- 
ric knowledge base is the domain theory , point (C) is 
the target concept , and the operationality criterion is 
the fact that a concept must be expressed in terms of 
known geometric objects. To visualize this reformula- 
tion step, let us refer to the derivation tree in Figure 5. 
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The rightmost branch indicates that C is a connection 
point and, therefore, it is no longer explored. The left- 
most branch, instead, uses a domain rule that refor- 
mulates a point in polar coordinates. Intuitively, the 
domain rule states that a point can be identified by 
its distance p from SI and by the angle a between 
points C, SI, and S2. With this in mind, the system 
recursively tries to determine angle(a, C, SI, S2) 
and distance(p, C, SI). After having exploited all 
proofs, the system concludes that it is not possible to 
re-express the angle and the distance in terms of 
known entities. If we were to follow EBL strictly, we 
should conclude that the domain theory is incomplete; 
that is, it is not powerful enough to bridge the gap be- 
tween unknowns and givens. This, in turn, implies that 
the search would terminate concluding that point (C) 
cannot be re-expressed in terms of known geometric 
objects. 

To overcome this problem we have used a discov- 
ery approach that fills these knowledge gaps with eu- 
reka {Burstail and Darlington, 1977] steps. Despite 
the name, however, in our strategy these steps are 


not arbitrary but inductive For the example in Fig- 
ure 1, we determine that the angle a between points 
C, SI, and S2 is exactly one-half the angle between 
points L, SI, and S2. Once this regularity is deter- 
mined, in contrast with Burstail and Darlington’s ap- 
proach, we test the eureka step against all user pro- 
vided examples to determine if it is a random occu- 
rance or a widespread phenomenon. In the former case, 
any use of this regularity is abandoned and others (if 
any) are tried. In the latter case, the regularity is as- 
sumed as a transformation of the unknown geometric 
entity. This is shown by the node in Figure 5 con- 
nected by the dashed lines. The system then subgoals 
on the geometric entities that were used to recognize 
the angle fi. These are recognized as givens because 
they were derived from the position of the load and of 
the supports and the search terminates. The discus- 
sion of the branch identified by the dotted is similar to 
the one above and it is omitted for the sake of brevity. 

The actual domain rules used in the geometric the- 
ory carry along also information that bridge the gap 
between the cartesian representation of a point and 
the polar one. This implies that the z and y coordi- 
nates of C can be expressed in terms of the angle a and 
of the distance p. In turn, the angle a is substituted 
by £ which can be computed from the given position 
of tne load and supports. These transformations are 
considered as constraints and are incorporated into the 
objective function which is further simplified using the 
symbolic techniques. The result of the incorporation 
is shown in Table 4. 

Rule derivation. The specialization steps discussed 
above greatly improve the running time of the optimiz- 
ers on each objective function but they might introduce 
a large number of candidate solutions. These, in princi- 
ple, can be exponential. To overcome this problem, we 
have devised a new inductive learning method to prune 
candidates that do not lead to optimal solutions. This 
method learns search control knowledge in the form of 
decision trees which can then be quickly transformed 
into IF-THEI-ELSB rules. These design rules associate 
features of the problem to a few regions in which the 
global minimum is believed to lie according to the ex- 
amples given to the learning algorithms. The global 
solution is then obtained by running the optimizer on 
each of these regions and by taking the minimum so- 
lution. 

We have found that most existing learning algo- 
rithms are not suitable for learning rules for optimiza- 
tion problems. The main obstacle is the absence of 
features that allow discrimination among classes. Al- 
gorithms like ID3 implicitely require independence of 
classes. Features with such discriminatory power are 
difficult to derive for many real application and espe- 
cially for optimization tasks. On the other hand, it is 
relatively easy to provide shallow features which can 
circumscribe a set of possible solutions. Therefore, in 
devising our learning method we have assumed that all 
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features are shallow and proposed UTILITYID3, a novel 
learning algorithms. The algorithm resembles the well- 
known ID3 algorithm [Quinlan, 1987] in that it builds a 
decision trees and uses an information-theoretic heuris- 
tic to choose the feature on which to split at each re- 
cursive call. However, it is new in that the heuristic 
takes into consideration that the output is a set of rec- 
ommended actions rather than a single discriminating 
class. This algorithm is fully described in [Cerbone, 
1992] and [Cerbone and Dietterich, 1992]. 

In addition to the learning algorithm, we have in- 
troduced maximum utility learning set , a new learning 
framework. In this framework, a utility is associated to 
each candidate solution. The problem is to learn a set 
of actions of maximum utility that covers all given ex- 
amples. For instance, in the design problem, the utility 
is a function of the time it takes the numerical opti- 
mizer to find a solution. The quality is measured with 
respect to the globally optimal design. It turns out 
that this learning Droblem is MV — complete [Garey 
and Johnson, 1979J. Hence, UTILITYID3 uses an ap- 
proximation algorithm to determine a solution. 

Experiments 

To test the efficacy of this approach, we [Cerbone and 
Dietterich, 1991] have solved a series of design prob- 
lems using an implementation based on Mathemat- 
ica [Wolfram, 1988], and we have measured the impact 
of the compilation stages on the evaluation of the ob- 
jective function, on the optimization task, and on the 
reliability of the optimization method. The measure- 
ments presented are averages over five randomly gen- 
erated designs and, for each design, over 25 randomly 
generated starting points. 

Objective function. The objective function of each 
design problem was evaluated in four different ways 
and, for each of them, we averaged the CPU 1 time 
over the different designs and starting points. The vol- 
ume was first computed using the traditional, naive, 
numerical procedure with the method of joints. We 
then compiled the designs incorporating, in three suc- 
cessive stages, topological information, the givens of 
the problems, and the stress state. Figure 6 shows the 
time (per 100 runs) to evaluate the objective function 
at the various compilation stages. The biggest speedup 
was obtained with the numerical substitution of values 
into the symbolic closed form expression obtained and 
with the specialization to stress states. This suggests 
that the gain is related to the elimination of arithmetic 
operations from the original numerical problem. 
Optimization. As indicated in Section , the running 
time of the optimizers is influenced by the number of 
function calls and by the time for each function evalu- 
ation. To present the benefits of our approach on the 
optimization task, we have experimented with two op- 

1 The examples were run on a NeXT Cube with a 68030 
board. 



Figure 6: Influence of the compilation stage on the 
CPU time per function evaluation. 

timization algorithms (a) an optimizer based on Pow- 
ell’s method [Pike, 1986] that does not require gradient 
information and (b) the version of conjugate gradient 
descent [Press and others, 1988] provided by Mathe- 
matica. The graphs in Figures 7 and 8 report, respec- 
tively, the number of objective-function calls and the 
overall CPU time for each optimizer. The values con- 
nected by solid lines correspond to cases where the op- 
timizer had no gradient information, while the values 
connected by dashed lines indicate averages utilizing 
the conjugate gradient descent method with alternar 
tive approximations for the gradient vector. 

As expected, the number of evaluations remains con- 
stant throughout the compilation stages when the non- 
gradient is used, while it decreases drastically when we 
switch to the gradient-based optimization method. 
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Figure 7: Influence of the compilation stage on the 
number of function calls. 

The overall CPU time (Figure 8) steadily decreases 
as well. For the non-gradient method, the decrease is 
due to the progressive simplification of the objective 
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function itself, so that it is cheaper to evaluate. When 
we switch to the gradient method, there is initially no 
speedup at all, because the cost of evaluating the full 
gradient offsets the decrease in the number of times the 
objective function must be evaluated. However, addi- 
tional speedups are obtained by approximating the ob- 
jective function as a quadratic and as a linear function 
(by truncating its Taylor series). 

We have found experimentally that there is no ap- 
preciable difference between the minima reached using 
the full gradient vector and the minima computed us- 
ing quadratic approximations of the partial derivatives. 
However, the precision of the results obtained with the 
linear approximation is significantly reduced. Depend- 
ing on the application, this trade of accuracy for speed 
may be acceptable. If not, the quadratic approxima- 
tion should be employed. 

Another possibility is to employ the linear approx- 
imation for the first half of the optimization search, 
and then switch to the quadratic approximation once 
the minimum is approached. In other words, the linear 
approximation can be applied to find a good starting 
point for performing a more exact search. 



our “divide-and-conquer” approach of searching each 
stress state in parallel will be guaranteed to produce 
the global optimum. 

We have tested these hypothesis by performing 20 
trials of the following procedure. First, a random start- 
ing location was chosen from one of the basins of the 
objective function that did not contain the global min- 
imum. Next, two optimization methods were applied: 
the non-gradient method and the conjugate gradient 
method. Finally, our divide-and-conquer method was 
applied using, for each of the specialized objective func- 
tions Vj , a random starting location that exhibited the 
corresponding stress state. In all cases, our method 
found the global minimum while the other two meth- 
ods converged to some other, local minimum. 

Concluding Remarks 
In this paper we have illustrated how machine learning 
techniques can be applied to optimal engineering de- 
sign. This has been accomplished by tackling problems 
in two different areas: 

• speeding up existing numerical methods 

• learning a set of candidate optimal solutions. 

Table 5 illustrates the correspondence between these 
problems and the machine learning techniques used 
in their solution. Our main contribution is to have 
shown that ML techniques can be effectively used to 
overcome some of the drawbacks of numerical optimiz- 
ers and to increase their efficiency. Another contribu- 
tion of this paper is to have shown that inductive tech- 
niques can complement traditional software engineer- 
ing approaches in mathematical domains. This greatly 
reduces the need for knowledge transfer from experts 
to computer systems. In our approach, these results 


Figure 8: Influence of the compilation stage on the 
CPU time. 

Reliability. An optimization method is reliable if 
it always finds the global minimum regardless of the 
starting point of the search. Unfortunately, as shown 
in Figure 3, the objective function in this task is not 
unimodal, which means that simple gradient-descent 
methods will be unreliable unless they are started in 
the right “basin.” It is the user’s responsibility to pro- 
vide such a starting point, and this makes numerical 
optimization methods difficult to use in practice. 

From inspecting graphs like Figure 3, it appears 
that, over each region corresponding to a single stress 
state, the objective function is unimodal. We conjec- 
ture that this is true for most of 2-D structural de- 
sign problems. This means that optimization can be 
started from any point within a stress state, and it will 
always find the same minimum. If this is true, then 


Table 5: Rows enumerate problems in optimal design. 
Columns list Machine Learning paradigms. X’s indi- 

the problem. 



Symbolic 

Methods 

Inductive 

Learning 

Selection 

Rules 

i 

X 

Speedup of 
Numerical 
Optimizers 

X 

X 


required the use of a blend of novel and traditional op- 
timization techniques. First, we have defined a new 
learning framework which is more appropriate to op- 
timization tasks. This framework involves (a) the re- 
quirement that the output of the learning algorithm be 
a set of alternatives and (b) measures of the cost of ob- 
taining solutions. The learning methods produce sets 
of minimum cost. Within this framework we have de- 
veloped algorithms which output IF-THEI-ELSE rules 
that associate problem characteristics (features) to sets 
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of optimal solutions. This is a contribution to basic 
research in machine learning. Second, we have demon- 
strated that inductive methods can also be used to 
simplify numerical problems. In fact we employed a 
discovery approach to reduce the number of indepen- 
dent variables. Finally, we have used more traditional 
compiler optimization techniques in a learning frame- 
work and merged them with inductive methods. We 
have shown that the overall result is a drastic speedup 
of the numerical optimization techniques. 

Our approach opens new research directions into the 
so far unexplored area of applications of machine learn- 
ing to numerical optimization. It is our hope that, in 
the medium-to long-term, our techniques will allow the 
use of specialized numerical optimizers in real-time ap- 
plications like intelligent CAD systems. 
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The initial goal of our research ([Cerbone, 1992], 
[Cerbone and Dietterich, 1992]) was to provide Ma- 
chine Learning techniques to speed up numerical opti- 
mization. However, in hindsight, we have found oppor- 
tunities to view part of our solution as a reformulation 
problem in mathematical domains. 

Mathematical domains present a unique opportunity 
to develop and to test reformulation techniques. A typ- 
ical mathematical task requires the solution of a set of 
equations subject to given constraints. As an example, 
in numerical optimization the problem is to determine 
the values of the independent variables that minimize 
an objective function subject to constraints on the val- 
ues of the variables. Solutions to mathematical tasks 
that arise in engineering or in physics often cannot be 
found by algebraic manipulation and numerical meth- 
ods must be employed to provide estimates. However, 
most optimization methods are slow and brittle. In 
fact, their speed and reliability depend on the choice 
of a starting point and on the formulation of the objec- 
tive function. This prevents their applicability in large 
scale real life tasks. 

To overcome these drawbacks, the user of numeri- 
cal methods spends an enourmous amount of time re- 
formulating the equations into a form appropriate for 
solution. In particular, for numerical optimization the 
goal is to determine a representation that allows to: 

• Specialize the objective function to convex regions 

• Decrease the number of independent variables 

In the remainder of this overview, we outline the 
challenges and a few solutions to these representation 
changes in the design of lightweight frames to support 
loads. 

A typical sequence of steps adopted by a problem 
solver to optimize a design is illustrated by the dashed 
lines in Figure 1 on the left. First, the engineer uses 
her/his own knowledge and experience to formulate a 
numerical task. Second, where possible, numerical op- 
timization techniques are used to produce an optimal 
solution. Numerical optimization is typically a slow 
and brittle process. This is due to the fact that most 
numerical methods are hillclimbers. Thus, their speed 
is greatly affected by the number of evaluations of the 


objective function and by the time required for each 
evaluation. As shown in [Cerbone, 1992], we have de- 
vised techniques to reformulate the task to produce a 
faster optimization. Under many circumstances, the 
goal of speeding up the optimization conflicts with the 
goal of letting the engineer specify the objective func- 
tion in a highly abstract format. In fact, while sim- 
plifying the formulation of the problem, the abstract 
specification can greatly slow down and decrease the 
reliability of the numerical solution. This is because 
optimizers have no knowledge of the problem domain. 
Therefore, at run time they use the same objective 
function provided by the engineer for all regions. In 
addition, some objective functions that arise in engi- 
neering are non differentiable. This prevents the use 
of powerful gradient-based numerical techniques - only 
slower function-based methods are applicable. This is 
especially true in the design task we are tackling. 

Our research augments the traditional problem solv- 
ing schema with the off-line knowledge compilation (or 
/earning) stage illustrated by the solid lines in Figure 1 
on the right. The compiler uses a blend of novel and 
traditional machine learning techniques to increase re- 
liability and speed of the numerical optimization task. 
These results are accomplished by reformulating at 
compile time the design problem into subproblems and 
by deriving: 

• Preprocessed objective functions for each subprob- 
lem 

• Search control knowledge that allows the problem 
solver to focus only on a few subproblems. 

The preprocessed functions contain fewer independent 
variables and have been greatly simplified. Therefore, 
they are faster to evaluate. At run time, the problem 
solver uses the search control knowledge derived dur- 
ing compilation to retrieve a few candidate solutions. 
Each of these candidates is then given as input to a nu- 
merical optimizer. However, in this case, the optimizer 
is given a simplified objective function. The net result 
is a speedup of as much as 95% over the run time of 
the traditional methods and a more reliable numerical 
optimization process. Being an off-line computation, 
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compilation does not introduce any overhead on the 
run time operations. 

The specialization of the objective function is di- 
vided into two stages: 

• Elimination of independent variables 

• Identification of the “correct” abstraction to parti- 
tion the function into convex regions. 

Each of these two stages may require the reformula- 
tion of the objective function. In mathematical do- 
mains, the reformulation task consists in applying alge- 
braic transformations to determine the “appropriate” 
format of an expression. To accomplish this task, alge- 
braic operations are treated as operators that modify 
the function. Reformulation is also accomplished by 
a representational shift which changes the coordinate 
system from, say, polar to cartesian. Further reformu- 
lations takes place by choosing the appropriate origin, 
scale, and orientation of the coordinate system. Each 
of these operations can be considered a s a reformula- 
tion of the original expression. As in most other re- 
formulation tasks, an exhaustive search of all possible 
reformulations is unfeasible. Therefore, one must de- 
vise techniques to control the search. 

In our research the need for reformulation arose 
during the elimination of independent variables. In 
some cases, the original objective function was given 
in cartesian coordinates. This representation did not 
allow any simplification. On the contrary, reformulat- 
ing the function in a different coordinate system and 
performing algebraic simplifications allowed the elim- 
ination of independent variables from the optimiza- 
tion process. This was possible because the reformu- 
lation in polar coordinates revealed regularities among 
variables. The regularities were detected during the 
search and incorporated into the function represented 
in the new coordinate system. Regularities are de- 
tected by using a domain theory and heuristics such 
as find equal angles. These heuristics detect reg- 
ularities only when the appropriate representation is 
chosen. In our solution ([Cerbone, 1992]), the search 
for the “correct” representation, is aided by classify- 
ing geometric entities (angles, points, lines, etc.) by 
type. These types are then related to changes in the 
representations. 

A second important application of reformulation 
techniques to optimization problems is the automatic 
discovery of abstractions. In our solutions, we have 
used abstractions to partition the original optimiza- 
tion task into independent sub-problems over convex 
regions. In mathematical domains, the abstraction is 
usually a function of the independent variables that 
represents a change among stable states of the physi- 
cal system. From a graphical standpoint, these changes 
correspond to multi-dimensional ridges that separate 
convex regions. To determine these abstractions the 
system must first find the singularities of the objec- 
tive function and then synthesize these findings into a 


single expression. Given the parametric nature of ex- 
pression, this task requires interpolation over a multi- 
din nsional parametric space. This is a complex task. 
He aver, if one takes into account the physical mean- 
ing of the regions of stability, it turns out that the de- 
termination of the state changes is a simpler process. 
In fact, we have used engineering intuition to partition 
the optimization problem into convex regions. Details 
of this process are contained in [Cerbone, 1992]. 

In conclusion, in this brief overview we have pre- 
sented a few challenges that mathematical tasks 
present to reformulation. We claim that mathemati- 
cal tasks are an ideal domain for reformulation tech- 
niques since they provide well-defined operators for 
representational shifts and the possibility of measur- 
ing the usefulness of a change of representation. On 
the other hand, mathematical domains pose formidable 
challenges that include a continuum search space and 
bridge 1 the gap between numerical data and a higher 
level guage that is closer to the experts’ intuition. 



Figure 1: Problem solving strategies in numerical op- 
timization. 
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1. Synthesis of Search Control 
Heuristics 

One portion of my research has focused on auto- 
matic synthesis of search control heuristics for con- 
straint satisfaction problems (CSPs). I have developed 
techniques for automatically synthesizing two types of 
heuristics for CSPs: Filtering functions are used to re- 
move portions of a search space from consideration. 
Evaluation functions are used to order the remain- 
ing choices. My techniques operate by first construct- 
ing exactly correct filters and evaluators. These oper- 
ate by exhaustively searching an entire CSP problem 
space. Abstracting and decomposing transformations 
are then applied in order to make the filters and eval- 
uators easier to compute. An abstracting transforma- 
tion replaces the original CSP problem space with a 
smaller abstraction space. A decomposing transfor- 
mation splits a single CSP problem space into two 
or more subspaces, ignoring any interactions between 
them. Both types of transformation potentially intro- 
duce errors into the initially exact filters and evalua- 
tors. The transformations thus implement a tradeoff 
between the cost of using filters and evaluators, and the 
accuracy of the heuristic advice they provide. I have 
shown these techniques to be capable of synthesizing 
useful heuristics in domains such as floor-planning and 
job-scheduling, among others. (See [Ellman, 1992].) 

2. Synthesis of Hierarchic Problem 
Solving Algorithms 

Another portion of my research is focused on automatic 
synthesis of hierarchic algorithms for solving constraint 
satisfaction problems (CSPs). I have developed a tech- 
nique for constructing hierarchic problem solvers based 
on numeric interval algebra. My system takes as inputs 
a candidate solution space S and a constraint C on 
candidate solutions. The solution space 5 is assumed 
to be a cartesian product R n where R is a set of inte- 
gers. The constraint C is assumed to be represented in 
terms of arithmetic, relational and boolean operations. 
From these inputs the system constructs an abstract 
solution space S a as a cartesian product R% where R* 


is a set of disjoint intervals that covers R . The system 
also constructs an abstract constraint C a on abstract 
solutions. The abstract constraint C„ is obtained from 
the original constraint C by replacing ordinary arith- 
metic operations with interval algebra operations and 
replacing boolean operations with boolean set opera- 
tions. The abstract space S„ and abstract constraint 
C a are then used to build a hierarchic problem solver 
that operates in two stages. The first stage finds an 
abstract solution in the space S a of intervals. The sec- 
ond stage refines the abstract solution into a concrete 
solution in the original search space 5. I have shown 
this approach to be capable of synthesizing efficient 
problem solvers in domains such as floor-planning and 
job-scheduling, among others. (See [Ellman, 1992].) 

3. Decomposition in Design 
Optimization 

Another portion of my research is focused on auto- 
matic decomposition of design optimization problems. 
We are using the design of racing yacht hulls as a 
testbed domain for this research. Decomposition is 
especially important in the design of complex physi- 
cal shapes such as yacht hulls. Exhaustive optimiza- 
tion is impossible because hull shapes are specified 
by a large number of parameters. Decomposition di- 
minishes optimization costs by partitioning the shape 
parameters into non-interacting or weakly-interacting 
sets. We have developed a combination of empiri- 
cal and knowledge-based techniques for finding use- 
ful decompositions. The knowledge-based method ex- 
amines a declarative description of the function to be 
optimized in order to identify parameters that poten- 
tially interact with each other. The empirical method 
runs computational experiments in order to determine 
which potential interactions actually do occur in prac- 
tice. We expect this approach to find decompositions 
that will result in faster optimization, with a minimal 
sacrifice in the quality of the resulting design. Imple- 
mentation and testing of this approach are currently in 
progress. (I am pursuing this research in collaboration 
with Mark Schwabacher.) (See [Ellman et al , 1992].) 
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4. Model Selection in Design 
Optimization 

Another portion of my research is focused on intelligent 
model selection in design optimization. The model se- 
lection problem results from the difficulty of using ex- 
act models to analyze the performance of candidate 
designs. For example, in the domain of racing yacht 
design, an exact analysis of a yacht’s performance 
would require a computationally expensive solution of 
the Navier-Stokes equations. Approximate models are 
therefore needed in order diminish the costs of analyz- 
ing and evaluating candidate designs. In many situa- 
tions, more than one approximate model is available. 
For example, in the yacht design domain, the induced 
resistance of a yacht can be predicted by solving La 
Place’s equation - an approximation of Navier-Stokes 
- or by using a simple algebraic formula. The two ap- 
proximations differ widely in both the costs of com- 
putation and the accuracy of the results. Intelligent 
model selection techniques are therefore needed to de- 
termine which approximation is appropriate during a 
given phase of the design process. 

We have attacked the model selection problem in 
the context of hillciimbing optimization. We have de- 
veloped a technique which we call "gradient magnitude 
based model selection". This technique is based on the 
observation that a highly approximate model will of- 
ten suffice when climbing a steep slope, because the 
correct direction of change is easy to determine. On 
the other hand, a more accurate model will often be 
required when climbing a gradual incline, because the 
correct direction of change is harder to determine. Our 
technique operates by comparing the estimated error 
of an approximation to the magnitude of the local gra- 
dient of the function to be optimized. An approxima- 
tion is considered acceptable as long as the gradient 
is large enough, or the error is small enough, so that 
each proposed hillclimbing step is guaranteed to im- 
prove the value of the goal function. Implementation 
and testing of this approach are currently in progress. 
I am pursuing this research in collaboration with John 
Keane. (See [Gilman ef aL, 1992].) 
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“unabstractions” of all the provable facts of the ab- 
stract space are provable in the ground space; 


In work done jointly with Toby Walsh, the author has 
provided a sound theoretical foundation to the pro- 
cess of reasoning with abstraction [GW90c; GWS9; 
GW90b; GW90aJ. The notion of abstraction formal- 
ized in this work can be informally described as: 

[property 1 ] the process of mapping a repre- 
sentation of a problem, called (following histori- 
cal convention [Sac74]) the “ground' representation, 
onto a new representation, called the “ abstract * 
representation, which: 

[property 2 ] helps deal with the problem in the 
original search space by preserving certain de- 
sirable properties and 

[property 3 ] is simpler to handle as it is con- 
structed from the ground representation by “throw- 
ing away details”. 

One desirable property preserved by an abstraction is 
provability; often there is a relationship between prov- 
ability in the ground representation and provability in 
the abstract representation. Another can be deduc- 
tion or, possibly inconsistency. By “throwing away de- 
tails” we usually mean that the problem is described 
in a language with a smaller search space (for instance 
a propositional language or a language without vari- 
ables) in which formulae of the abstract representation 
are obtained from the formulae of the ground represen- 
tation by the use of some terminating rewriting tech- 
nique. Often we require that the use of abstraction 
results in more efficient reasoning. However, it might 
simply increase the number of facts asserted (eg. by 
allowing, in practice, the exploration of deeper search 
spaces or by implementing some form of learning). 

Among all abstractions, three very important classes 
have been identified. They relate the set of facts prov- 
able in the ground space to those provable in the ab- 
stract space. We call: 

• TI abstractions all those abstractions where the ab- 
stractions of all the provable facts of the ground 
space are provable in the abstract space; 

• TD abstractions all those abstractions where the 


• TC abstractions all those abstractions where a fact 
is provable in the ground space if and only if its 
abstraction is provable in the abstract space. 

Historically the word abstraction has been mainly used 
with a much more restricted meaning which captures 
its use in problem solving and planning (for instance 
in Abstrips or Soar). Our notion of abstraction (and 
in particular the three classes defined above) turns out 
to capture and provide and unifying framework for de- 
scribing work done in the definition of decision pro- 
cedures (see for instance [DG79; Giu9l]), in planning 
and problem solving (see for instance [Sac73; E1190; 
MH91; Kno89]), explanation (see for instance [Doy86]) t 
common sense reasoning (see for instance [Hob85]), 
qualitative and model based reasoning (see for instance 
[Moz90; Wel9l]), approximate reasoning [lmi87]), anal- 
ogy (see for instance [Ble90]) and reasoning with very 
large data bases (see for instance [Lev92]). 

At a close look abstraction seems also very related to 
problem reformulation. In particular it seems that 
problem reformulation can be characterized as using 
some of the subclasses of TC and TD abstractions in- 
troduced in [GW90c]. A positive feedback on this 
intuition would allow to use the framework described 
in (GW90c; GW89] to put the work on problem refor- 
mulation on a more solid ground and, at the same time, 
to study and compare the techniques used in problem 
reformulation with the techniques used in all the other 
areas captured by the framework. 
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Abstract 

Selecting a good bias prior to concept learning 
can be difficult Therefore, dynamic bias adjustment is 
becoming increasingly popular. Current dynamic bias 
adjustment systems, however, are limited in their abil- 
ity to identify erroneous assumptions about the rela- 
tionship between the bias and the target concept. 
Without proper diagnosis, it is difficult to identify and 
then remedy faulty assumptions. We have developed 
an approach that makes these assumptions explicit, 
actively tests them with queries to an oracle, and 
adjusts the bias based on the test results. 

1 Introduction 

Bias is a fundamental aspect of any supervised 
concept learner. Numerous papers have noted this 
importance (e.g., Mitchell 1980; Haussler 1988). The 
type of bias that we discuss here is the choice of a 
hypothesis language. The hypothesis language defines 
the space of hypotheses. A strong bias defines a small 
hypothesis space; a weak bias defines a large 
hypothesis space; a correct bias defines a space that 
includes the target concept A strong correct bias, e.g., 
one with fewer features, is generally desirable because 
it reduces the number of hypothesis choices and 
thereby promotes rapid convergence to the target con- 
cept 

The bias can be adjusted (shifted) dynamically 
during incremental concept learning by strengthening 
the bias when possible and weakening it to regain 
correctness. Recently, interest has grown in systems 
that dynamically shift the bias (e.g., Utgoff 1986; Ren- 
dell 1990; Spears & Gordon 1991). These systems, 
however, are limited in their ability to identify errone- 
ous assumptions about the relationship between the 
bias and the target concept Proper diagnosis aids in 
the recovery from faulty assumptions. We have 
developed an approach to bias adjustment that 
addresses this need for proper diagnosis. Our method 
consists of a bias tester and adjuster that can be added 


to an incremental concept learner to improve the 
learner’s performance. 

Unlike previous approaches to bias testing, our 
approach uses formal definitions of assumptions about 
the bias, called biasing assumptions, to guide an 
analysis of why the bias is inappropriate (e.g., too 
weak, or incorrect) for learning the target concept. 
An example of a biasing assumption is the irrelevance 
of a feature for learning the target concept. The bias 
testa- performs this analysis (called a biasing assump- 
tion test ) by actively testing the bias with queries to an 
oracle. Each quay is a request to an instance genera- 
tor for a new instance. Fa example, the irrelevance of 
a feature might be tested by querying an oracle fa the 
class (positive/negative) of instances having different 
values of that feature. The bias adjusta then records 
the analysis results and adjusts the bias accordingly. If 
a biasing assumption holds, the adjusta strengthens 
the bias, e.g., by removing the irrelevant feature from 
the hypothesis language. Otherwise, the adjuster 
weakens the bias a allows the bias to stay the same if 
no adjustments are needed. 

Our approach has three primary advantages. 
First, the bias tests are composed of queries. Queries 
can accelerate learning significantly (see Gordon 
1990; 1992). Second, our approach is designed to be 
incorporated into an existing concept learner. Third, 
our approach diagnoses the bias to find and record 
specific erroneous biasing assumptions. This enables 
the bias to be minimally weakened as well as 
corrected. Minimal weakening is most advantageous 
when a stronger bias is desirable. In that case, bias 
strengthening along with minimal bias weakening can 
enable very rapid acquisition of die target concept (see 
Gordon 1990; 1992). 

hi our framework, the bias is the set of features 
and their values in the hypothesis language. These 
values appear in value trees (e.g., see Figure 1), which 
are input by a user a knowledge engineer who is 
somewhat familiar with the domain. Value trees are 
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typically called generalization trees because parent 
nodes are more general than their child nodes. Train- 
ing instances are described in terms of leaf node 
values. Throughout this paper, we assume the concept 
learner begins with hypotheses described in terms of 
the instance language and evolves its hypotheses 
(perhaps using the value trees) in a specific -to-general 
direction. Generalization increases the generality of 
values within a particular hypothesis; abstraction 
increases the generality of the hypothesis language. 
The concept learner can use value trees for generaliza- 
tion. Our approach to bias testing and adjustment uses 
value trees for abstraction. Bias strengthening implies 
removal (i.e., abstraction) of a feature or feature value 
distinction from the hypothesis language. This shrinks 
the hypothesis space. Bias weakening implies the res- 
toration of features or feature value distinctions. This 
weakening undoes abstraction and enlarges the 
hypothesis space. Bias weakening is defined to be 
minimized when the features and feature value distinc- 
tions that are restored to the language are restricted to 
those that must be restored to correct the bias. 

The drawback of our approach is that it requires 
an oracle that can respond to queries during learning. 
The oracle can be either a human or the environment. 
In either case, it is not always practical to require an 
oracle. Humans may be too busy to answer questions. 
Furthermore, the use of the environment as an oracle is 
impractical if lives are at stake. For example, it is 
unreasonable to query whether a new chemical 
weapon is effective at killing people. On the other 
hand, it is practical to query whether small doses of 
Vitamin C cure the common cold. 

Using Figure 1, we can see how a bias may be 
strengthened, weakened, and minimally weakened. 
Suppose the bias is all the trees in Figure 1, and the 
target concept states that small bricks are positive and 
instances of any other description are negative. The 
bias might be strengthened by removing all features 
other than “size” from the hypothesis language. This 
bias is incorrect because “shape” information is also 
needed to learn the target concept. One way to 
weaken and correct the bias is to restore the original 
language. Alternatively, we can minimally weaken 
the bias by restoring parts of the “shape” value tree 
but none of the “material” tree. Within the “shape” 
tree, we restore the “cube’ ’/“brick” distinction and 
above, and restore the “curved-solid” node, but do 
not restore any child of the “curved-solid” node. 
Removing a distinction strengthens the bias to create 
an abstraction, whereas restoring it weakens the bias 


to undo the abstraction. 

Section 2 formally defines two important bias- 
ing assumptions and then collapses them into one. 
Section 3 presents and analyzes algorithms to test the 
collapsed assumption. Section 4 summarizes and 
explains empirical results. Finally, Sections 5 and 6 
present related work and a summary of the paper. 

2 Biasing Assumptions 

When supervised concept learners shift their 
bias, they typically make an implicit biasing assump- 
tion that the bias shift is correct for learning the target 
concept Our approach makes each biasing assump- 
tion explicit and associates each assumption with an 
abstraction operator. If the assumption holds, the 
corresponding abstraction operator can fire. 

We assume two abstraction operators: climb- 
value-tree(f,a) and remove-feature(f). The climb- 
value-treeO^) operator replaces values of feature / 
that are lower in the value tree (e.g., “cube” and 
“brick” in Figure 1) with a value a (e.g„ “prism”) 
that is higher in the tree throughout the hypothesis 
language. The remove-feature(/) operator eliminates 
feature/from the hypothesis language. We associate a 
cohesion assumption with climb-value-tree^/i). 
Cohesion implies that the values below a in the value 
tree of/are unnecessary for predicting the target con- 
cept membership of instances. We associate an 
irrelevance assumption, which is equivalent to cohe- 
sion at the root node of a value tree, with drop- 
feature^. Irrelevance implies that the feature to be 
removed is unnecessary for predicting the target con- 
cept membership of instances. 

The following are the formal definitions of the 
two biasing assumptions. These definitions are 
tailored for an incremental concept teaming context. 
We assume one new instance is accepted at a time and 
all previous instances are saved. Furthermore, we 
assume that the instances are not noisy and that the 
instance features are sufficient to distinguish positive 
from negative instances, though perhaps not ideal for 
learning the target concept We abbreviate the set of 
all known positive instances at time t with POS(r), the 
set of all known negative instances at time t with 
NEGft), the set of all positive instances with POS, and 
the set of all negative instances with NEG. We abbre- 
viate the new instance at time t with i ( t ), the target 
concept with TC, the irrelevance biasing assumption 
with IRR(f,TC,t ) for feature /, and the cohesion bias- 
ing assumption with COH(a,TC,t) for value a. 
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FIG. 1. Value trees. 
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For the following definitions, if i(t) is positive, 
we let L(t)=(POS(0 {i (t))) and L=POS or we let 
L(0=NEG(0 and L=NEG. Likewise, if i (t) is nega- 
tive, we let L(fMNEG(0 {i (t)}) and L=NEG or we 
let L(7)=POS(0 and L=POS. 

Let be the set of features considered 

relevant as of time (t - 1). Let 1 where i is the 
subscript used in the following definitions. Finally, we 
define /(x,v;) to mean that the value of feature / for 
instance x is v,-. Although the instance language con- 
sists of value tree leaf nodes, a nonleaf node can also 
be used to describe an instance, though not uniquely. 
We allow v; to be either a leaf or nonleaf node in the 
following definitions. The formal definition of the 
irrelevance biasing assumption is: 


IRR(f h TC,t ) o 

((Vv , v.)(((3x e L (OX/, C r.v, ) /.(x,v.))) -* 

((Vw,XVyX(/,(y.v 1 ) &..A fi(y,wi) /.(y.v.)) 

->(yei))))). 

In other words, / is considered inelevant to learning 
TC at time t if changing the value of / in any known 
instance x always yields a (new or old) instance whose 
classification (positive/negative) is the same as that of 
x. 

Next, we define the cohesion biasing assump- 
tion. The cohesion of value a with respect to the tar- 
get concept, COH(a,TC,t), means that the descendent 


nodes (d,, . . . ,aj below value (node) a in the value 
tree appear to behave equivalently with respect to tar- 
get concept membership. Let A * {a x ,... ,aj. Let 
l£&i and The formal definition is: 

COH(a,TC,t) «-> 

((Vv, v„)(Va; € AM 3x e L(t)) 

(AfcVl) fi(x,Oj) /„(x,v,))) -> 

((Va* e AXVyX(/i(y,v,) f>(y,a k ) /.(y.v.)) 

-►(ye £)))))• 

In other words, cohesion holds for value a for learning 
TC at time t if the replacement of one descendent 
value of a with another descendent value in any 
known instance x always yields a (new or old) instance 
whose classification is the same as that of x. Note that 
irrelevance is a special case of cohesion that occurs 
when a is the root node of a value tree. Therefore, 
these two assumptions can be collapsed into one. Let 
us call the collapsed assumption IRR -COH (a,TC,t). 
The definition of this collapsed assumption is identical 
to that of COH(a,TC,t). 

3 Queries for Testing the Biasing 
Assumptions 

The definition of IRR-COH(a,TC,t ) presented 
in the last section has been translated into algorithmic 
biasing assumption tests. This section presents the 
algorithms for these tests. There are two types of 
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biasing assumption tests, corresponding to the two 
times at which tests are executed. Each type is associ- 
ated with a separate algorithm. One type of test (in 
Section 3.1) executes before bias shifting, and the 
other type (in Section 3.2) executes after bias shifting. 

Like the definition on which they are based, our 
biasing assumption tests are tailored for incremental 
concept learning. If an assumption test is satisfied, 
then the corresponding biasing assumption is con- 
sidered valid, and therefore it is “safe” to implement 
the abstraction corresponding to this assumption. If 
the test is not satisfied, corrective action may be 
required. 

We assume that our approach to bias testing and 
shifting is added to an incremental concept learner that 
maintains two Disjunctive Normal Form (DNF) 
hypotheses: one that covers all previously seen posi- 
tive instances and one that covers all previously seen 
negative instances. The flow of control begins when a 
query to the instance generator requests a new 
instance. When the new instance is received by the 
concept learner, the learner uses its hypotheses to 
predict the class of this instance. The learner then 
consults an oracle to find out the true class of the 
instance. If enough instances have been seen at this 
time to complete an assumption test, the bias is shifted 
according to the test results. Next, the learner updates 
its hypotheses to preserve completeness and con- 
sistency. Completeness implies the positive 
hypothesis covers all known positive instances and the 
negative hypothesis covers all known negative 
instances. Consistency implies the positive hypothesis 
covers no known negative instances and the negative 
hypothesis covers no known positive instances. These 
steps are repeated until the user decides the target con- 
cept has been learned. For more details see (Gordon 
1992). 

We introduce four types of queries to facilitate 
concept learning with bias shifts: bias strengthening 
queries, bias weakening queries, counterexample 
queries, and random instance queries. An assumption 
test is a sequence of bias strengthening queries or a 
sequence of bias weakening queries. Bias strengthen- 
ing queries test abstractions before they are made; bias 
weakening queries retest abstractions after they have 
been made. The last two types of queries are not part 
of assumption tests, but they are useful for other rea- 
sons. The purpose of counterexample queries is to find 
out whether the bias is incorrect If an incorrectness is 
found, the bias weakening queries then determine why 
the bias is incorrect The purpose of the random 


instance queries is to generate instances for concept 
learning when none of the other queries applies. All 
queries except the bias weakening queries request 
instances not previously seen. Bias weakening queries 
try to use previously seen instances before generating 
new ones because they retest previously held assump- 
tions, and the necessary instances to do this are often 
already present 

Random instance and counterexample queries 
are simple, so we describe them first Random 
instance queries are requests to the instance generator 
for randomly generated (previously unseen) instances. 
Counterexample queries can provide counterexamples 
because they are requests for randomly generated 
(unseen) instances that are covered by one of the 
hypotheses. A negative instance covered by the posi- 
tive hypothesis is a counterexample, and a positive 
instance covered by the negative hypothesis is a coun- 
terexample. 

3.1 Bias Strengthening Queries 

Bias strengthening queries test whether the bias- 
ing assumption associated with a potential abstraction 
holds. To do this, these queries test nodes of the value 
tree below the potential abstraction. If these values do 
not seem useful for distinguishing target concept 
membership, the abstraction is made. 

The assumption tests that use bias strengthening 
queries and are executed prior to abstraction may be as 
rigorous as desired. Tests that use more queries are 
more rigorous. (Gordon 1990) describes a method for 
varying the rigor of these tests. Increasing the rigor 
can reduce the number of prediction errors, but it can 
also significantly increase the cost of the tests. For 
example, suppose we wish to test the cohesion of 
value a of feature /. Let us consider how we may vary 
the rigor of tests that are based on the formal definition 
of IRR-COH(a,TC,t) in Section 2. We can increase 
the rigor with which we test the cohesion assumption 
by the following two methods: (1) Increase the 
number of values a* from A to substitute for the origi- 
nal value aj before assuming cohesion holds; (2) 
Increase the number of instances x whose / value is 
varied before assuming cohesion holds. (Note that 
each x corresponds to a unique choice of values for v t 
through v„.) Either of these methods will increase the 
number of queries. 

Here, we describe an algorithm that does not 
have very rigorous assumption tests and is therefore 
not excessively costly. It is not very rigorous because 
only one sibling of the original value is tested for each 
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hypothesis disjunct before making an abstraction, and 
because hypothesis disjuncts, rather than instances, 
have their values varied. There arc typically far fewer 
disjuncts than instances. Our algorithm for generating 
bias strengthening queries to test biasing assumptions 
is the following: 

Repeat the following until no more unseen, uncovered 
instances i ' can be generated: 

For each instance feature / do 

Find the value a that is the parent node of the 
value of/ in some (arbitrary) disjunct of one of 
the two hypotheses. The value a is a potential 
abstraction to be tested. If this value has been 
tested previously, then select another value for 
a from another disjunct. 

For each (positive, negative) hypothesis h do 
For each disjunct disj of h do 

(1) Find /(. x,v), a conjunct of disj. If 
none exists, or if v is not a child of a, try 
another disjunct 

(2) Set SIBLINGS equal to the set of all 
siblings (which share a parent node a) of 
v in the value tree of /. These siblings 
are children of a. 

(3) Select s s SIBLINGS. 

(4) Replace "fixy)" in disj with 
“flxj)'' to form d\ Then form a poten- 
tial instance i’ that satisfies d’ by 
translating d' to the language of the 
instances, which consists of leaf values 
in the value trees. When translating to 
leaf values, the choice of a descendent 
of a higher level value is random. 
Check to see that »' is not already 
covered by the hypotheses and has not 
been seen yet. If i' is uncovered and 
unseen, request < ' from the instance gen- 
erator and continue. Otherwise, try 
other choices of leaf value descendents 
until they have all been tried. If the des- 
cendents have all been tried, then go to 
step (3) to find another sibling. 

(5) Accept i' from the instance genera- 
tor and consult the oracle for the class of 
»'. 

Endfor 

Endfor 

(If the assumption test for abstraction a has 
succeeded at this point, the bias adjuster makes 
the abstraction.) 

Endfor 


This algorithm executes a sequence of biasing 
assumption tests. Each biasing assumption test 
corresponds to a test of an abstraction a. This test con- 
sists of a sequence of queries that vary the values in 
the hypothesis disjuncts. To ensure that these queries 
do not overlap with the other query types, bias 
strengthening queries only request instances that are 
not covered by either of the hypotheses and have not 
yet been seen. To form each bias strengthening query, 
this algorithm selects one feature /of one disjunct of 
one hypothesis h and alters the value of this feature. 
This is a form of perturbation (Porter & Kibler 1986). 
The value of / that is perturbed is a child, i.e., an 
immediate descendent, of the abstraction a in the 
value tree. The new value obtained through perturba- 
tion is a sibling of the original value, i.e., both values 
are children of a in the value tree. This new value is 
substituted for the old value in the hypothesis disjunct, 
and an instance that matches this description and has 
random values for unspecified features is requested. 

Let us assume the algorithm is testing abstrac- 
tion a. For each hypothesis h, suppose perturbing the 
value of / in all disjuncts of h yields only instances 
whose class is the same as that of h. Then nodes in the 
value tree below a do not seem to be useful for distin- 
guishing positive from negative instances (though this 
assumption might later be proven wrong). Thus, the 
abstraction to a is permissible and can therefore be 
made by the bias adjuster. In other words, the biasing 
assumption test for abstraction a is satisfied. On the 
other hand, if an instance of a different class is gen- 
erated, the abstraction cannot be made because the 
biasing assumption test is not satisfied. 

Because we assume the hypotheses begin with 
the language of the instances, which consists of leaf 
values in the value trees, this algorithm tests abstrac- 
tions one value tree level at a time, beginning one 
level up from the leaves. For example, when using the 
“material” tree of Figure 1, the language shift to 
“alloy” would be tested before the language shift to 
the root node “any material”. 

We can now see how this algorithm corresponds 
to the formal definition of IRR-COH(a,TC,t ) in Sec- 
tion 2. For each disjunct d of hypothesis h, we let x 
(from the assumption definition) be any instance 
covered by d. We also let v i through v, be the feature 
values present in d. To perturb the value of feature / , 
our algorithm selects an a* for the sibling value. If the 
substitution of a k into d yields a new instance whose 
classification differs from that of x, the assumption 
being tested does not hold. On the other hand, if the 
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substitution of a*’s into every disjunct of h yields only 
instances of the same class as x, our assumption passes 
the test and is considered to hold. The perturbation 
values a k are children of node a in a value tree, where 
a is the abstraction being tested. If a is the root node of 
the value tree, our algorithm tests the irrelevance off. 
Otherwise, our algorithm tests the cohesion of a. 

An upper bound on the number of bias 
strengthening queries generated by this algorithm is 
O (F • D * d), where F is the number of instance 
features, D is the maximum number of disjuncts in the 
two hypotheses, and d is the maximum depth of a 
value tree. The branching factor of the value tree is 
not included in the cost of this algorithm because to 
test each abstraction only one sibling value is 
requested for each disjunct 

To illustrate bias strengthening queries, suppose 
we are testing the feature “size”, where the features 
and trees of Figure 1 are used. Furthermore, suppose 
feature “material” is considered irrelevant and has 
been removed from the hypothesis language, and the 
current hypotheses are: 

POS HYP: { x I 

((size(x,small) & shape(x, brick)) 
v 

(size(x4arge) & shape(x,sphere))) } 

NEGHYP: {xl 

(size(xxnedium) & shape(x,cy Under)) ). 

Then a bias strengthening query to test the abstraction 
to “any size” is formed by using the first disjunct of 
the positive hypothesis to request a medium brick (or 
large brick) that has a randomly chosen value for 
“material”. Then the second disjunct of the positive 
hypothesis is used to request a small sphere (or 
medium sphere), and the only disjunct of the negative 
hypothesis is used to request a small cylinder (or 
large cylinder), each with randomly chosen values for 
“material”. If the first two instances are positive and 
the third instance is negative, then “size” is con- 
sidered irrelevant and is removed from the hypothesis 
language by the bias adjuster. An abstraction is 
created when "size” disappears from the hypothesis 
language. On the other hand, if any of the requested 
instances has a different classification than the 
hypothesis from it was derived (e.g., the first instance 
is negative), then the abstraction is not created and 
“size” remains in the hypothesis language. 


3 2 Bias Weakening Queries 

Because the algorithm of Section 3.1 does not 
exhaustively test the biasing assumptions (e.g., only 
one sibling value per disjunct is tested before creating 
an abstraction), abstractions made after running this 
algorithm cannot be guaranteed to be correct. There- 
fore, bias weakening queries are needed to retest the 
abstractions after a prediction error in case the predic- 
tion error is due to an incorrect abstraction rather than 
a generalization error. These queries perturb the 
values of the description of the instance for which a 
wrong prediction has been made to isolate erroneous 
abstractions that might have caused the error. 

Suppose a wrong prediction is made on a new 
instance i, and H is the hypothesis whose class differs 
from that of i. Then the following is our algorithm for 
generating bias weakening queries following a wrong 
prediction: 

Form the set A of all abstractions present in the 
hypothesis language (and in H) that apply to i. These 
are the abstractions to be tested. Elements of A are 
feature-value pairs. 

For each (f,a) e A that is not already known to be 
faulty do 

Let L be the set of all leaves in the value tree for/ 

that are below a. 

For each v e L do 

(1) Substitute v for the corresponding feature 
value in the description of i to form a new 
description of i\ If i' has not yet been seen, 
ask the concept learner to predict then get the 
actual class (from the oracle) of i'. Otherwise, 
use the known class of < '. 

(2) If the class of i’ is the same as that of i then 
loop again to find another element of L. Other- 
wise, record (f,a) as being faulty and abort this 
loop through the values of L to get another ele- 
ment of A. 

Endfor 

Endfor 

An abstraction applies to an instance if it has a value 
of feature/that is more general than the value of/in 
the instance. In other words, the abstraction must be a 
value tree ancestor of the value of /in the instance. 

This algorithm retests the biasing assumption 
associated with each abstraction a. The values for per- 
turbation are leaf nodes in the value tree below the 
abstraction a being tested. A feature value of i is 
varied by substituting a perturbation value into the 
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description of t and requesting an instance of this new 
description from the instance generator along with the 
class of the requested instance. If perturbing any of 
the values of i causes the generation of a new instance 
of a different class than i, then the abstraction being 
tested by perturbation is faulty. This abstraction is 
faulty because it removes a distinction that is below it 
in the value tree and that is necessary for predicting 
target concept membership. 

Although bias weakening queries do not retest 
all biasing assumptions, they rigorously retest all bias- 
ing assumptions associated with abstractions that 
apply to (. Therefore, they identify all biasing assump- 
tion errors that caused the error in predicting the class 
of i. By doing so, these queries correct the bias in a 
way that enables the concept learner to regain con- 
sistency and completeness with respect to all previous 
instances, including i. We consider this testing to be 
rigorous because all descendent (leaf) values are 
tested until a value is found that disallows the abstrac- 
tion. If none is found, the concept learner regains con- 
sistency and completeness without bias shifts. 

Similarly to Section 3.1, we can see how this 
algorithm corresponds to the formal definition of 
IRR-COH (a,TC,t) in Section 2. This algorithm alters 
the value of / in the new instance i for which a wrong 
prediction has been made to create queries that request 
new instances. The algorithm then tests whether these 
newly-created instances have the same classification 
as i. The instance i plays the role of x in the 
definitions, and the perturbation values are the a k ' s. If 
the abstraction a being tested is a root node value of a 
value tree, this algorithm tests the irrelevance of /. 
Otherwise, the algorithm tests the cohesion of a. 

A is the set of all abstractions that apply to i. If 
F is the number of instance features, then the max- 
imum size of A is F. This is because, for each instance 
feature /, only one ancestor (in the value tree) of the 
value of /in i will be in the hypothesis language at a 
particular time, and there are at most F features in the 
hypothesis language. In other words, for each /, there 
exists at most one (f,a) e A. Furthermore, for each 
( f,a ) € A, this algorithm tests all leaf node descendents 
of a. There are at most b d of these descendents, where 
d is the maximum depth and b is the maximum branch- 
ing factor of any value tree. Therefore, an upper 
bound on the number of queries generated by this 
algorithm is O (F * b d ). This algorithm is executed for 
each instance i for which the concept learner makes a 
wrong prediction. 


To illustrate bias weakening queries, we con- 
tinue with the example in Section 3.1. Suppose the 
bias strengthening queries cause “size'’ to be con- 
sidered irrelevant and removed from the hypothesis 
language. The hypotheses are now: 

POS HYP: ( x I ((shapefy, brick)) v 
(shape(x,sphere))) ) 

NEG HYP: { x I (shape(x,cylinder)) ) . 

If a large copper brick is not among the known 
instances, and a counterexample query requests one, 
and this instance is negative, then the concept learner 
will incorrectly predict the class of this instance. Bias 
weakening queries now perturb the description of this 
instance to determine the source of the prediction 
error. Perturbation to test the abstraction to “any 
size” might result in a request for an instance that is a 
small copper brick. If this example is positive, the 
assumption that “size” is irrelevant for distinguishing 
target concept membership is incorrect. If this 
instance is negative, the next query might be a request 
for a medium copper brick. 

After bias weakening queries identify incorrect 
biasing assumptions, the bias adjuster weakens the 
bias to correct it In the example just described, 
“size” would have to be restored to the hypothesis 
language to distinguish the small copper brick that is 
positive from the large copper brick that is negative. 
After the bias has been corrected, the concept learner 
relearns the instances to reform the hypotheses with 
the new language bias. The bias weakening queries 
enable bias weakening to be minimized because they 
identify the incorrect biasing assumptions. All 
assumptions not proven incorrect (in the past or the 
present) can be assumed to hold and can therefore be 
preserved during relearning. 

3J Order of the Queries 

The order for selecting the queries is as follows. 
The first query is a random instance query; the next 
quay is a bias strengthening query. Bias strengthen- 
ing queries continue as the default unless one of the 
following holds: (1) complacency occurs; (2) the con- 
cept learner makes a prediction error; or (3) no bias 
strengthening query can be formed. 

Complacency is defined to occur when the con- 
cept learner has made four consecutive, correct predic- 
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tions. 1 A string of correct predictions indicates either 
that the concept has been learned or that counterexam- 
ples to the current hypotheses should be sought Since 
it is not possible to know for certain whether the 
correct concept has been learned, counterexample 
queries occur in response to complacency. Bias weak- 
ening queries are the response to prediction errors. 
Once the diagnosis performed by these queries is com- 
pleted, the bias strengthening queries resume until 
complacency occurs. If none of the other queries can 
be formed, random instance queries are generated. 
Counterexample queries can be formed only if they 
generate unseen instances that are covered by the 
current hypotheses. Bias strengthening queries can be 
formed only if they generate unseen instances that are 
not covered by the current hypotheses. 

4 Results and Cost/Benefit Analyses 

In this section, we summarize previously pub- 
lished empirical results. We then explain these results 
from three perspectives: system bias appropriateness, a 
query cost analysis, and a query benefit analysis. 
Finally, we present an example that illustrates why our 
approach is effective. 

4.1 Empirical Results 

We have added an implementation of our 
approach to bias testing and shifting to an incremental 
concept learner to form a system called PREDICTOR 
(Gordon 1990; 1992). In the experiments of (Gordon 
1992), PREDICTOR’S performance is compared with 
that of a baseline system called Iba’s Algorithm Con- 
cept Learner (IACL), which is based on an algorithm 
from (Iba 1979). PREDICTOR is built on top of IACL 
by extending IACL’s bias shifting capabilities. PRED- 
ICTOR is identical to IACL in all ways except two: 
the former system tests the bias prior to bias shifting 
whereas the latter does not, and the former system 
prefers a stronger bias than the latter system. Both 
systems consult an oracle to answer membership 
queries. IACL’s membership queries are requests for 
randomly chosen instances. Also, both systems can 
shift the bias. However, IACL, like ID3 (Quinlan 
1986) and a number of other concept learners, inter- 
leaves hypothesis selection and term (feature) selec- 
tion. For IACL, bias shifting is not a deliberate, high- 
priority task as it is for PREDICTOR. 


1 The number of correct predictions needed was chosen by 
empirical tests. 


The empirical experiments of (Gordon 1992) 
demonstrate that when its method is appropriate, 
PREDICTOR produces an order of magnitude 
improvement in the rate of convergence to the target 
concept and its negation over IACL. The empirical 
experiments of (Gordon 1990) demonstrate that, when 
appropriate, PREDICTOR has a better convergence 
rate than all the other systems with which it has been 
compared (IACL, a variant of ED3, and a version of 
AQ described in Michalski et al. 1986). If inappropri- 
ate, however, PREDICTOR produces a performance 
degradation with respect to IACL and the other sys- 
tems. PREDICTOR’S method is appropriate precisely 
when one would expect it to be - when bias shifting is 
the most expedient action to take to learn the target 
concept, i.e., there is a large disparity between the 
instance language and the language in which the target 
concept can be expressed most succinctly. 

It is easy to see how this disparity would be 
likely to occur in many real-world learning situations. 
Most objects are described in verms of primitive 
features. It is reasonable to expect that a knowledge 
engineer, who is familiar with the domain in which 
concept learning will occur, would be aware of a 
number of potentially useful abstractions but would 
not be certain which abstractions are relevant for 
learning the concept. Therefore, this engineer might 
have the system begin with the known primitive 
features, but provide the system with potential abstrac- 
tions. When provided with a set of potentially useful 
value trees, PREDICTOR’S queries can isolate those 
abstractions which are correct and thereby expedite 
the learning process. 

From an experimental study described in (Gor- 
don 1992), we have learned that PREDICTOR has a 
synergistic effect between its bias shifting method and 
its bias tests. The system’s bias shifting method pro- 
motes this synergy because few bias strengthening 
queries are required before an abstraction occurs. 
(Recall from Section 3.1 that the number is propor- 
tional to the number of hypothesis disjuncts.) Further- 
more, the system minimally weakens the bias after a 
set of bias weakening queries (see Gordon 1992). In 
both cases, PREDICTOR is able to capitalize on the 
query results to gain and maintain a strong bias. 

42 System Bias Appropriateness 

In this paper, we use the verm “bias” 
synonymously with ’’hypothesis language bias”. This 
is the bias that PREDICTOR adjusts. In this section, 
we will discuss a meta-level bias, namely, the system 
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bias. To avoid confusion with the hypothesis language 
bias, hereafter, we refer to the system bias as the 
system policy or simply the policy as in (Provost & 
Buchanan 1992). At some level, all concept learning 
systems have some form of fixed system policy. For 
example, even the most adjustable system cannot 
implement all possible bias adjustment methods. The 
choice of bias adjustment methods is a fixed system 
policy. No one policy can be best for learning every 
target concept. Therefore, we need to develop an 
understanding of the appropriateness of system poli- 
cies for learning different classes of concepts. 

PREDICTOR’S policy is its implicit assumption 
that abstraction is appropriate to try (by strengthening 
the language bias) and to retain (by minimally weak- 
ening the language bias) as much as possible. 
Strengthening and minimally weakening the bias are 
appropriate when they are die correct actions to take 
in the context of the current instance language and tar- 
get concept 

PREDICTOR’S bias strengthening and weaken- 
ing queries, which form the majority of the system’s 
queries, are geared entirely toward gathering informa- 
tion for bias shifting. This system uses its queries to 
reorder the instances to increase its information gain 
about the hypothesis language bias early in the learn- 
ing process. Bias shifting is its priority. If this priority 
matches the task, PREDICTOR usually outperforms 
all other systems with which it is compared. This is 
because once PREDICTOR’S queries have achieved 
their goal of a strong correct bias, the number of 
instances required to converge on the target concept is 
often significantly reduced by this bias shift (see Sec- 
tion 4.4). 

Although IACL can adjust the hypothesis 
language, PREDICTOR generally far outperforms 
IACL when bias shifting is important This is because 
IACL’s policy places a higher priority on finding less 
general hypotheses and also on maintaining hypothesis 
consistency and completeness with previous training 
instances than it does on bias shifting (see Gordon 
1992). This system only shifts the bias when it 
decides this is the best way to achieve its other priori- 
ties. IACL’s queries are requests for randomly- 
generated instances. Therefore, this system does not 
favorably order the instances for gathering informa- 
tion about the bias. The other systems with which 
PREDICTOR has been compared have problems that 
are similar to IACL’s. 

In the next two sections, we present cost/benefit 
analyses that further explain the empirical results. 


43 Query Costs 

The upper bounds on PREDICTOR’S queries, 
presented in Sections 3.1 and 3.2, seem somewhat 
high. Why does this system generate fewer queries 
(converge earlier) than IACL when bias shifting is 
appropriate for learning the target concept? Why does 
it generate more queries than IACL when it is inap- 
propriate? In this section, we present a rough cost 
comparison between the number of queries generated 
by PREDICTOR and the number generated by IACL. 
To simplify our analysis and avoid a confusion 
between the effects of generalization and those of 
abstraction, let us suppose that no generalization is 
needed for learning the target concept. 2 

In Section 3.1, we show that an upper bound on 
the number of bias strengthening queries is 
0(F • D * d), where F is the number of instance 
features, D is the maximum number of hypothesis dis- 
juncts, and d is the maximum depth of any value tree. 
According to Section 3.2, an upper bound on the 
number of bias weakening queries generated in 
response to each wrong prediction on a new instance is 
O (F * b d ), where b is the maximum branching factor 
of any value tree. 

Suppose W wrong predictions are made prior to 
convergence. 3 Then an upper bound on the number of 
bias weakening queries generated to resolve W wrong 
predictions is 0(W * F* b d ). Therefore, an upper 
bound on the total number of bias strengthening and 
weakening queries prior to convergence to the target 
concept and its negation is 0((F * D * d) + (W * 
F * b d )). 

Hie total number of PREDICTOR’S queries also 
includes random instance and counterexample queries. 
However, in this analysis we ignore the cost of these 
two types of queries because at most four random 
instance queries occur before the counterexample 
queries activate (see Section 3.3), and the counterex- 
ample queries typically do not take long to find a 
counterexample. Once a counterexample is found, the 
bias weakening and then strengthening queries 
resume. 


1 This simplification does not significantly affect our analysis. 

1 PREDICTOR also executes bias weakening queries at 
another time beside* when it makes a wrong prediction. 
Nevertheless, this does not significantly alter our cost estimates. 
We therefore omit a discussion of this topic. See (Gordon 1992) for 
details. 
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IACL’s bias shifts are Higgled by the order of 
the training instances (see Gordon 1992). A serendipi- 
tous order will enable the system to make the correct 
abstractions early. However, since this order is ran- 
dom, it cannot be guaranteed to be helpful. Further- 
more, as mentioned in Section 4.2, IACL’s bias shifts 
occur when IACL decides this is the best way to 
achieve its other goals. The main problem with this 
approach is that it does not offer much help to a system 
that needs to recover from incorrect abstractions. 
Often, what is really an abstraction problem is treated 
as a generalization problem by the system. As a result, 
incorrect abstractions often linger, thereby preventing 
correct abstractions from being made. In the worst 
case, this would cause the number of queries required 
for convergence to the target concept and its negation 
to equal the total number of instances, which is 
O (b d * F ), where b d is the maximum number of leaf 
nodes in any value tree. 

Comparing the upper bounds for the two sys- 
tems, we note that as F increases, IACL’s performance 
should degrade much more rapidly than that of PRED- 
ICTOR. Furthermore, the upper bound cost for IACL 
depends on the data structures but not on the target 
concept. On the other hand, W and D in the formula 
for PREDICTOR’S upper bound cost depend, at least 
in part, on the target concept. 

D depends almost entirely on the target concept 
IP, on the other hand, depends both on the target con- 
cept and the bias appropriateness. If we assume the 
data structures and target concept are fixed, and we 
wish to analyze the effectiveness of bias shifting, then 
we need to focus on the IP component of the cost for- 
mula. In particular, as IP approaches zero, 
PREDICTOR’S cost upper bound approaches a poly- 
nomial. IP tends toward zero as a greater proportion 
of the biasing assumptions made by PREDICTOR are 
correct, e.g., most features are irrelevant and therefore 
the irrelevance assumption holds frequently. This is 
precisely the situation in which empirical experiments 
have shown PREDICTOR outperforms IACL. 

Likewise, when most of the biasing assumptions 
arc incorrect, IP can become very large. In the worst 
case, we would have the same situation as we have 
with IACL, where all instances would have to be seen 
to learn the target concept. Again, empirical experi- 
ments confirm this analysis. 

Our analysis of IACL’s upper bound cost is 
much like that of the other systems with which PRED- 
ICTOR has been compared because these systems are 
also not designed for bias testing and shifting. Their 


system policies favor other tasks. 

4.4 Query Benefits 

One of the goals of our method for bias testing 
and shifting is to reduce the number of features in the 
hypothesis language. Relevant results from computa- 
tional learning theory can provide a rough estimate of 
the benefits of strengthening the bias in this way. 4 The 
sample complexity is the number of instances required 
to converge to the target concept. (Haussler 1988) has 
shown that sample complexity relates directly to the 
Vapnik-Chervonenkis (VC) dimension of a hypothesis 
space, which is a measure of the expressiveness of the 
hypothesis language. A less expressive language 
implies a stronger bias and a lower VC-dimension. 
Haussler measures convergence in the Probably 
Approximately Correct (PAQ framework of leamabil- 
ity, which assumes the error e > 0 and the confidence 
(1 - 8) < 1. In this framework, a concept is expected to 
be learned approximately with high probability. 

According to (Haussler 1988), given a fixed e 
and S the minimum sample complexity is directly pro- 
portional to the VC-dimension. Furthermore, if the 
concept hypothesis is in k-DNF (which is true for 
many concept learners), then 

VC -dim (H) S 4ks log(4 ks^i), 

where H is the hypothesis space, n is the number of 
features in the instance language, s is a bound on the 
number of terms (disjuncts), k £ «, and s<, |jjf j . 

We can now apply this theoretical estimate of 
sample complexity to partially explain the empirical 
results summarized in Section 4.1. In our framework, 
empirical performance is measured in terms of abso- 
lute convergence to the target concept, rather than 
PAC convergence. Absolute convergence implies the 
error e is 0 and the confidence (1 - 8) is 1. In other 
words, we require that the target concept be learned 
precisely. We also require that the negation of the tar- 
get concept be learned precisely for absolute conver- 
gence to hold. (These requirements are meaningful for 
applications where the cost of making an error is 
high.) 

In our framework, the maximum number of 
literals per term, k, is equal to the number of (relevant) 


4 Only a “rough" estimate can be provided because the 
computational learning theory makes assumptions, such as instance 
selection from a fixed distribution, that are not met in our 
framework. 
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features in the hypothesis language. Also, we use a 
different bound on s for our framework because when 
there are irrelevant features, n no longer affects the 
number of disjuncts. If we assume there is a maximum 
of v values for any of the k relevant features, s 5 v*. 
When PREDICTOR discovers irrelevant features and 
strengthens the bias (by reducing k), Haussler’s ine- 
quality (with the new upper bound on s) predicts that 
the system will reduce the VC-dimension and thus 
reduce the sample complexity. With the new bound on 
s, Haussler’s inequality also predicts a reduction in 
sample complexity when v is reduced by abstractions 
that are made based on cohesion assumptions. Furth- 
ermore, PREDICTOR’S ability to minimally weaken 
the bias enables it to retain a hypothesis space with a 
low VC-dimension. 

In summary, when bias strengthening is desir- 
able, the cost of using queries to gather information 
about the bias can be offset by the benefit of a reduc- 
tion in the sample complexity gained by having a 
stronger bias. In Section 4.3, we showed how in this 
situation the costs are also reduced. So, when 
appropriate, PREDICTOR’S method can yield lower 
costs and increased benefits and a better performance 
in comparison with IACL and other systems. When 
inappropriate, the costs increase and the benefits are 
reduced and system performance degrades with 
respect to the performance of IACL and other systems. 

4.5 Illustrative Example 

Let us examine a very simple illustration of how 
the queries and bias shifts together can result in a syn- 
ergy that reduces the convergence rate. We will focus 
primarily on the value of the bias weakening queries 
and minimal bias weakening. Assume we have two 
concept learners, CL-Q and CL. They differ only in 
that CL-Q uses the bias weakening queries and per- 
forms minimal bias weakening, whereas CL does not. 
CL’s method for bias weakening is to make the 
hypothesis language equal to the instance language. 
CL’s motivation for bias weakening is the same as that 
of CL-Q, namely, to resolve prediction errors. Other 
than the bias weakening queries of CL-Q, we assume 
both systems request random instances from an oracle. 

For simplicity, let us further suppose that both 
systems are given a strong bias beforehand by the sys- 
tem implementor and their only bias shifting task is to 
weaken the bias if they discover (by a wrong predic- 
tion) that the bias is incorrect Also, for simplicity, we 
assume neither system generalizes. They only leam 
concepts using bias adjustments. 


The data structures given to both systems are the 
value trees of Figure 1, except that feature “size” is 
now restricted to having the values “small” and 
“large”, and the value “curved-solid” has no child 
values. The target concept, as in Section 1, states that 
small bricks are positive and instances of any other 
description are negative. Finally, both systems begin 
with a hypothesis language bias which states that 
“shape” is the only relevant feature. 

Both systems begin by requesting the same two 
instances: a small aluminum brick, which is posi- 
tive, and a large steel curved-solid, which is negative. 
The current hypothesis now held by both systems is: 

POS HYP: { x I (shape(x,brick)) ) 

NEG HYP: { x I (shape(x,curved-solid) ) } . 

If the next instance requested by both systems is a 
large bronze brick, and it is negative, both systems 
will make a prediction error, which triggers bias weak- 
ening. 

CL-Q responds to the prediction error by using 
its bias weakening queries. It requests one instance to 
retest the abstraction to “any size” • a small bronze 
brick, which is positive. Since the class is different 
than that of the large bronze brick, CL-Q decides the 
abstraction to “any size” is faulty and removes it 
CL-Q then requests a large aluminum brick, a large 
copper brick, a large brass brick, and a large steel 
brick to retest the abstraction to “any material”. 
Since these instances have the same class as the large 
bronze brick, this abstraction is considered permissi- 
ble. CL-Q now minimally weakens the bias to obtain 
the hypotheses: 

POS HYP: { x I 

(size(x,small) & shape(x,brick)) } 

NEG HYP: { xl 

((size(xjaige) & shape(x, brick)) 
v 

(size(x, large) & shape(x,curved-solid))) ). 

The system now needs one more instance, a small 
curved-solid of any material, to converge on the tar- 
get concept and its negation precisely. The odds are 
high that a random choice will request this instance 
before all instances have been seen. (Note that if the 
bias strengthening queries were used, this would be 
the next instance requested.) 
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After making a prediction error on the large 
bronze brick, CL behaves differently. It does not 
make the extra five queries that CL-Q made. Neither 
does it have the extra information CL-Q obtained. 
Rather than minimally weakening the bias, CL weak- 
ens the bias to the instance language. CL, therefore, 
requires all remaining 27 instances to precisely leam 
the target concept and its negation. In this case, it is 
clear that the information gained by CL-Q from the 
queries have offset their cost 

5 Related work 

Our approach is related to theoretical research 
on irrelevance (e.g.. Subram an ian 1989) and relevance 
(Grosof & Russell 1989). Subramanian’s definition of 
irrelevance is similar to ours. However, her definition 
is tailored for reformulating a problem solver’s 
language to increase the problem .Giver’s efficiency. 
Our definition, on the other hand, is tailored for incre- 
mental concept learning. Grosof and Russell have 
created a theory of shifting bias as nonmonotonic rea- 
soning. They use the notion of relevance to motivate 
bias shifts. Prior to learning, biases are ordered from 
stronger (i.e., having fewer relevant components) to 
weaker (i.e., having more relevant components). 
Learning begins with a strong bias and shifts to 
weaker biases as needed. Unlike the bias shifts 
described here, Grosof and Russell’s bias shifts are not 
motivated by an analysis of biasing errors and there- 
fore they are unable to guarantee that the bias can be 
minimally weakened. 

Our approach is also related to approaches that 
form abstractions based on equivalence classes. If 
cohesion holds for a value a of feature /, then a forms 
an equivalence class in terms of the target concept 
membership of instances having this value of feature /. 
Kokar’s COPER is an example of another system that 
uses equivalence classes for concept learning (Kokar 
1990). COPER uses the concept of invariance and an 
expectation of equivalence classes to indicate when 
constructive induction is needed. Constructive induc- 
tion is the dynamic generation of new features. 
PREDICTOR could use a similar approach to 
COPER’S to decide when to invent new abstractions. 
For example, if cohesion does not hold for some 
abstraction a in a value tree, this could be considered 
an indication of the need to split a into two separate 
abstractions for which cohesion does hold. 

The use of active learning in our approach is 
related to literature on queries for concept learning, 
both theoretical (e.g., Angluin 1988) and experimental 


(e.g., Sammut & Banerji 1986; Muggleton 1987). Our 
approach is most similar to that of the few systems that 
query an oracle and shift the bias. Notable examples 
include the MARVIN system of Sammut & Banerji 
(1986), Gross’s CAT (Gross 1991), Muggleton’s Duce 
(1987), and the CLINT system described in De Raedt 
& Bruynooghe (1990). MARVIN shifts the bias by 
learning the definitions of new user-selected terms and 
then uses these new terms for further learning. This 
system also queries an oracle to test generalizations 
within the term definitions. These queries involve a 
form of perturbation similar to that used here. 
Nevertheless, MARVIN’s queries are not for the pur- 
pose of deciding how to shift the bias. 

CAT, Duce, and CLINT are all systems that 
query an oracle to make bias shifting decisions. Of all 
these approaches, our approach is most similar to that 
of CLINT, which uses irrelevance queries for bias 
strengthening. Nevertheless, the latter system does not 
use irrelevance queries to select a weaker bias. Furth- 
ermore, our approach is unlike that of CLINT and all 
other systems because our choice of a weaker bias is 
determined by bias tests that diagnose errors in such a 
way as to guarantee that the bias can be minimally 
weakened. 

6 Summary 

This paper presents a unique approach to bias 
shifting. Rather than performing unjustified bias 
shifts, as most concept learning systems do, we first 
test assumptions about the relationship between the 
bias and the concept being teamed. We also re-test 
these assumptions in light of new, possibly contradic- 
tory evidence, i.e., a prediction error. These tests are 
performed with queries to an oracle. By using this 
approach, a system can both strengthen and minimally 
weaken the bias. 

In addition to presenting our method fra bias 
testing, this paper also summarizes empirical results. 
The empirical results are then explained by a 
cost/benefit analysis. Both the empirical and analyti- 
cal results indicate that when bias strengthening is 
desirable, the query costs are lower and the benefits of 
a stronger bias, namely, a reduced sample complexity, 
are increased. Likewise, when bias strengthening is 
not appropriate, the query costs are increased and the 
benefits are reduced. The low costs and increased 
benefits have produced a large performance gain over 
other systems when bias strengthening is appropriate, 
and the high costs and decreased benefits have pro- 
duced a performance loss with respect to other systems 
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when bias strengthening is inappropriate. 
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Abstract 

We aim to help build programs that do large-scale, 
expressive non-monotonic reasoning (NMR): es- 
pecially, “learning agents ” that store, and revise, 
a body of conclusions while continually acquiring 
new, possibly defeasible, premise beliefs. Cur- 
rently available procedures for forward inference 
and belief revision are exhaustive , and thus im- 
practical: they compute the entire non-monotonic 
theory, then re-compute from scratch upon updat- 
ing with new axioms. These methods are thus 
badly intractable. In most theories of interest, 
even backward reasoning is combinatoric (at least 
NP-hard). Here, we give theoretical results for 
prioritized circumscription that show how to refor- 
mulate default theories so as to make forward in- 
ference be selective , as well as concurrent ; and to 
restrict belief revision to a part of the theory. We 
elaborate a detailed divide-and-conquer strategy. 
We develop concepts of structure in NM theories, 
by showing how to reformulate them in a partic- 
ular fashion: to be conjunctively decomposed into 
a collection of smaller “part” theories. We iden- 
tify two well-behaved special cases that are easily 
recognized in terms of syntactic properties: dis- 
joint appearances of predicates, and disjoint ap- 
pearances of individuals (terms). As part of this, 
we also definitional Ig reformulate the global ax- 
ioms, one by one, in addition to applying decom- 
position. We identify a broad class of prioritized 
default theories, generalizing default inheritance , 
for which our results especially bear fruit. For this 
asocially monadic class, decomposition permits 
reasoning to be localized to individuals (ground 
terms), and reduced to propositional . Our refor- 
mulation methods are implementable in polyno- 
mial time , and apply to several other NM for- 
malisms beyond circumscription. 


Introduction 

Large-Scale, Expressively Rich, Learning 


Agents: We aim in this work 1 to help build agents 
that do large scale, expressive non-monotonic reason- 
ing (NMR). We are interested especially in what we 
call /earning agents: automatic programs that store, 
and revise, a body of conclusions while continually ac- 
quiring new, possibly defeasible, premise beliefs. 

In many applications, information about which de- 
faults take precedence over others (have greater pri- 
oritization) is important and available. 2 Many ap- 
plications need the ability to express fairly arbitrary 
first-order forms of default beliefs (e.g., induction, law, 
natural language, communication), as well as fairly ar- 
bitrary (finite) partial orders of precedence (e.g., speci- 
ficity, reliability, and authority are not “layered” (a.k.a. 
“stratified”))). [Grosof, 1991] defines and discusses the 
importance of non-layered priority. Non-layered pri- 
ority is needed, for example, to adequately represent 
default inheritance. 

In these applications, we regard as desirable for 
many reasons, especially validation (both intuitive and 
rigorous), that a NM formalism be “expressively rich” 
not only in the above senses, but also that it be 
equipped with a relatively strong model-theoretic se- 
mantics (e.g., cf. Default Logic [Reiter, 1980], circum- 
scription [McCarthy, 1986] [Lifschitz, 1984], and Au- 
toepistemic Logic [Moore, 1985]). In this connection, 
we also are interested in skeptical or cautious, rather 
than credulous or brave, entailment. 

Current Incapabilities: Currently, expressively 
rich NMR 3 has found virtually no application on a 
large scale (more than order of ten defaults), except for 
the rather special cases of Prolog-style logic programs 
and simple inheritance cf. AI frame-based systems. 

Part of the problem is that there do not yet ex- 


1 part of forthcoming PhD dissertation [Grosof, 1992b] 

3 Note, however, that most of the discussion and results, 
e.g., about disjoint describability and definitional reformu- 
lation and asocially monadic theories, in this paper also 
apply to the basic case where there are only two “priority 
levels*: for-sure and defeasible. 

3 In [Grosof, 1992b], we make this more precise; here, let 
us just consider circumscription, Default Logic, and Au- 
toepistemic Logic. 
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ist practical inference mechanisms to support storing 
and revising a limited body of conclusions as a work- 
ing theory. Currently, for expressively rich NMR 4 , 
the only procedures for forward 5 inference are exhaus - 
five: they compute the entire non-monotonic theory 
(or, even worse, all credulous extensions). Also, cur- 
rently, there are no procedures for performing belief 
revision on a body of conclusions, upon receiving new, 
asserted axioms (an update), beyond the exhaustive 
method of re-computing everything from scratch. (By 
“axiom” , we mean a premise belief.) 6 By belief re- 
vision, we mean modifying the stored conclusions to 
retract those that are no longer entailed by the newly 
augmented axiom set. 7 By updating, we mean belief 
revision plus possibly the inference and storing of some 
additional conclusions. 

Strategy and Summary: In this work, we attack 
these problems at the level of logical understanding 
(rather than, say, domain-dependent control of reason- 
ing). Our analytic perspective is that a prime underly- 
ing difficulty in the tasks of inference and updating, as 
well as in specification, is the logical globaliiy of NMR: 
in general, conclusions depend on the whole of the ax- 
iom set. The exhaustiveness of current methods is, in 
effect, a manifestation of their caution in dealing with 
(conflicting) interaction. 

We define the concept of a prioritized database y us- 
ing circumscription, as the logical representation of a 
learning agent that performs sound , but incomplete, 
expressively rich NMR. By database, we mean a sub- 
set of a (NM) theory. Prioritized circumscription meets 
our prime expressive concerns, offers mathematical 
convenience, and has inference procedures currently 
available. 

We elaborate a detailed “divide and conquer’’ strat- 
egy. We develop concepts of, and results about, struc- 
ture in prioritized circumscriptive theories, by show- 
ing how to reformulate them in a particular fashion: 
to be conjunctively “decomposed* hierarchically into a 
collection of smaller “part” theories, i.e., sub-theories 
which we call slices. We show that it is possible, and 
useful, to slice within slices. In this way, we map 
groups of axioms to groups of conclusions. We use 
the decompositions to analyze the interaction between 
defaults / parts in a NM theory. Much technical diffi- 
culty and trickiness arises from the expressive need to 
consider non-layered prioritization. 

We give theorems that localize entailment and thus 
show how to make forward inference be selective , as 

4 including even the propositional special case and the 
special case of stratified logic programs with negation [Lif- 
schitz, 1987] [Przymusinski, 1988J 

5 bottom-up. By “backward”, we mean totally goal- 
directed cf. query- answering. 

6 NM formalisms, e.g., JTMS’s [Doyle, 1979], having 
such procedures lack our desired expressive properties. 

7 For simplicity, we assume that these are the only ones 
removed from storage. 


well as concurrent. Exhaustive inference on a slice 
generates only a part of the global theory. Inferences 
within each slice (sub-theory) can be performed in par- 
allel with inference within every other slice. All non- 
monotonic inference can be localized to the slices; only 
monofontc inference is required between the slices. We 
give theorems that localize retraction and thus show 
how to make belief revision be partial in the sense that, 
for a given update, the arena of potential retraction is 
known to be restricted to a particular part of the pre- 
vious database. 

Our results enable the exploitation of other results 
on inference and belief revision that are limited to ex- 
pressive special cases, say to do exhaustive forward 
inference in polynomial time (e.g., the “sympathetic- 
solitary” case in [Grosof, 1992b] that generalizes pred- 
icate completion [Clark, 1978] and the Closed World 
Assumption). These special case results can be ap- 
plied to one, or several, slices, even when they do not 
apply to the global theory. 

Our results are about well-behaved special cases that 
are easily recognized in terms of syntactic properties. 
The first “cleanly slice-able * property is disjointness of 
mentioned predicates. We show that if the for-sure and 
default axioms can be partitioned into groups which 
are disjoint in terms of the predicate symbols they men- 
tion, then non-monotonic inference based on each par- 
tition can proceed without considering the axioms in 
the other partitions: those other axioms are irrelevant 
in an important sense, as far as that partition is con- 
cerned. We show this implies that updating with new 
for-sure and default axioms that span only some of the 
previous partitions does not require retracting previous 
conclusions based purely on the remaining partitions: 
they are safe . 

Most large practical applications, however, do not 
display such perfect partitionability of mentioned pred- 
icates. The real power from our result about disjoint- 
ness of predicates comes when it is combined with an- 
other kind of reformulation: of the axioms in a given 
global axiom set, not just of the global axiom set into 
decomposed constituent axiom sets. We define a con- 
cept of disjoint describability: syntactic partitionabil- 
ity after definitional reformulation of the axioms. As 
part of this, we give a logical definition of a particular 
kind of definitional (i.e., equivalence-preserving) refor- 
mulation with respect to a background theory, modi- 
fying the standard logical idea of a conservative exten- 
sion. We also discuss, and use, another kind of refor- 
mulation: to break up open defaults (i.e., schema- type, 
as opposed to closed, i.e., propositional) into cases. 
An important difference from definitionally reformu- 
lating monotonic theories is that two default axioms 
Dl and D2 cannot, in general, be equivalently replaced 
by the default axiom corresponding to the conjunction 
Dl A D2 the way that two for-sure axioms can always 
be equivalently replaced by their conjunction B 1 A B 2. 
This is why we need to consider reformulation of the 
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axioms one-by-one. 

Using these definitional and default-cases reformu- 
lations, we arrive at our second cleanly slice-able, yet 
syntactically recognizable, property: disjointness of 
mentioned individuals. We show that a fairly broad 
class ( “asocially monadic”) of prioritized default cir- 
cumscriptions is cleanly slice-able into one slice the- 
ory per named individual (ground term in the lan- 
guage) plus a remainder-case slice. Each of these 
individual-wise slices is propositional, and is, essen- 
tially, much simpler than the global, in several ways: 
number of axiom instances (especially, potential primi- 
tive default conclusions), availability of inference meth- 
ods, and availability of known computational complex- 
ity results. (Unfortunately, we do not have space to 
discuss NM inference methods and complexity results 
in detail. But see the final section.) The asocially 
monadic class includes, as a special case, default 
inheritance networks of the kind studied by [Touret- 
zky, 1986] and used in many AI frame-based systems. 
The asocially monadic class is more general, however: 
it permits more than one antecedent in default rules, 
free use of negation, and freer use of disjunction. 

While definitional reformulation is hard in general, 
we have a polynomial-time algorithm (omitted in 
this draft to save space and preserve focus) to pe- 
form the recognition and exploitation of this asocial- 
monadic reformulation and decomposition. More pre- 
cisely, the algorithm is 0(n 3 ), where n is the size of the 
(global) axiom set (which is, moreover, typically much 
smaller than the whole theory, of course). 

We show that our conjunctive decomposition results 
imply safeties in belief revision. We illustrate the prob- 
lems of scale in learning agents with an extended exam- 
ple of a prioritized database and show that our safety 
theorems capture much of the preponderant stability 
(i.e., most beliefs are preserved after each update) that 
this database displays through its sequence of updates. 
We show, using the example, that decompositions on 
these two bases combine synergistically , as well as Ai- 
crarchically: it is useful to slice within slices. 

Finally, we observe that our formal reformulation 
methods are implementable at reasonable cost, and 
apply to several other NM formalisms. We have 
polynomial-time algorithms (again, omitted here 
due to space and focus) for disjoint predicates, as well 
as for asocially monadic, also in 0(n 3 ), where n is the 
size of the (global) axiom set. 

A Motivating Example 

Next, we give an extended example of a learning agent, 
in the domain of common-sense default reasoning, that 
illustrates issues of selective forward inference and par- 
tial belief revision on a large scale. We present it first 
at an intuitive level, and formalize it later. 

We adopt the following notation. A •> prefix indi- 
cates that the sentence that follows is a base axiom, i.e., 
has for-sure (non-defeasible) belief status. A :> pre- 


fix indicates that the formula that follows is a <fe/ati/f 
axiom (roughly, a normal default without pre-requisite 
in Default Logic). Its label, e.g. (c/1), serves as a tag 
for defining prioritization-type precedence between de- 
faults via VTIETSTI ( prioritization ) axioms. These de- 
fine a strict partial order of precedence, via transitive 
closure. VH£^£H(dl f d2) J for example, means that 
the default axiom with label (dl) has strictly greater 
precedence (priority) than the default axiom with label 
(d2). 

We make the Uniqueness of Names Assumption 
(consider it included as a for-sure axiom). As a short- 
hand for conjunctions of for-sure assertions of posi- 
tive or negative literals, we list the satisfying objects, 
or, more generally, tuples. Often, in this context, we 
use “. . .” to indicate that there are additional satisfy- 
ing tuples not shown explicitly; for simplicity’s sake, 
we assume these objects are distinct from all other 
explicitly-shown objects. 

In this example, the agent starts with no beliefs, then 
accumulates axioms by receiving updates. After each 
update, the agent draws a bunch of conclusions (say, 
ground first-order sentences), both monotonically and 
non-monotonically, and retracts some of its previous 
conclusions. Each U% indicates an update, consisting 
of one or more axioms. Axioms are numbered. In 
addition, we show explicitly with and $ a few of 
the more interesting NM conclusions and retractions, 
respectively, about which discussion will revolve. Note 
that, by “conclusion" , we always mean in the skeptical 
sense. 

The first update consists of a default axiom, that 
bats have two legs, together with some for-sure ax- 
ioms. Non-monotonic (default) conclusions include 
that known bats are two-legged. The second update 
consists of another, default axiom, that mammals have 
four i^gs, together with the precedence axiom, that this 
new l^fault has lower priority than the previous, more 
spec ;ftc one. The third update consists of two default 
axioms about emergency disaster situations, plus some 
associated for-sure information. Intuitively, since the 
axioms in this new update are about a totally differ- 
ent topic than the previous axioms, they should not 
result in having to retract any of the previous conclu- 
sions. Moreover, intuitively, the agent should be able 
to draw the conclusions from these new axioms without 
even having to consider the previous ones in detail. 

The fourth update consists of some for-sure informa- 
tion about two named individuals, Joe and Spot , that 
violates some previous default conclusions. Intuitively, 
since there is no information that “connects” any other 
named individuals to Joe and Spot , these new axioms 
should not result in having to retract any of the pre- 
vious conclusions that are not about those named in- 
dividuals: e.g., that are about some other named indi- 
viduals. For example, the previous default conclusion 
2 lega(Betsy) should not have to be retracted. 

Later, we will show how to capture these intuitions 
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Example’s Axioms and Sample Conclusions 

U\ : Mammals Taxonomy plus: Bats are Two-Legged 

[ 1 ] •> Va\ bat(x) D mammal(x) 

[2] •> Vx. doff(x) D mammal(x) 

[3] •> +bat : Betsy , Joe, June , Jackie , . . . 

[4] •> +do <7 : Ftdo, Spot , Siccem, Jumper , . . . 

[5] •> Va;. ->(2/e^s(x) A 4/e^s(x)) 

[ 6 ] (dl) :> 6 ai(x) D 2 legs(x) 

kJAi 2 legs(Betsy) A 2legs(Joe) A . . . 

U 2 : Lower-Priority Default about Legged-ness 

[7] (d2) :> mammal(x) D Alegs(x) 

[ 8 ] VH£F£n(d\,d2) 

kJA, 4legs(Fido) A 4legs(Spot) A ... 

W 3 : Emergencies (cf. [Grosof, 1991]) 

[9] (d3) :> fire(place,day) A person(x) D leave(x, place, day) 

[10] (d4) :> earthquake(place,day) A person(x) D leave(x, place, day) 

[ 1 1 ] •> +person : Sue, Andy, Ed, Peg, Maggie , Eileen, Chang , . . . 

[ 12 ] •> +fire: {Baltimore, 2/4/03), (Watts, 8/2/67),. . . 

[13] •> 4-earthquake: (SF, 4/8/06), (MexicoCity, 5/3/87),... 

[14] •> Vx, place, day. leave(x, place, day) D -<attendjwork(x, place, day) 

kJA, [« leave(Sue, SF, 4/8/06) A leave(Andy, Waifs, 8/2/67) A .. . 

7/4 : Legged-ness: Selective Defeat For Individuals 

[15] •> ->2 legs(Joe) A ->4legs(Joe) A ->2 legs(Spot) A ->4 legs(Spot) 

kJA, 2/ejrs(Joe) ; 4 legs(Spot) 

U *, : Work Attendance (cf. [Grosof, 1991]) 

[16] (d 6 ) :> weekday(d) A reg.employ(person, place) D attend-wor k(per son, place, d) 

[17] (d7) :> flu(person,day) D ->attendjwork(person, place, day) 

[18] VH£T£1l(dl,d$) 

[19] Vn£7£1l(dZ,dl) k Vn£F£1l(d4, d7) k VH£T£'R(d'o,dl) 

Us • Ed is 111; Conflict Resolved by Prioritization (cf. [Grosof, 1991]) 

[20] •> -| -weekday: Today, 11/12/91,... 

[21] •> -fr eg .employ : (Ed, Bldg A ), . . . 

[ 22 ] •> 4- flu: (Ed, Today ),. . . 

kiUi [« -i attend.work(Ed, Today) 

U 7 : Miscellany: Meetings and Attendance (cf. [Grosof, 1991]) 

[23] •> +Tuesday : Today, 11/12/91,... 

[24] •> +in.group(p, 4321) : Ed, Peg, Maggie,. . . 

[25] •> 'i per son. in.group(per son, 4321) D reg.employ (person, Bldg A) 

[26] •> ->vacation(Boss(4321),d) : Today,... 

[27] •> 'dp, d. group.meeting(p,d) A in.group(p, 4321) D attend.work(p, Bldg A, d) 

Us : Group Meetings; Non-Layered Conflict (cf. [Grosof, 1991]) 

[28] (d9) :> in-group(p, 4321) A Tuesday(d) D group.meeting(p, Bldg A, d) 

[29] (dlO) :> in-group(p, 4321) A vacation(Boss(4321),d) D ->group.meeting(p, Bldg A, d) 

[30] P7l£F£7l(dlO,d9) 

[31] J>ft£.F£ft(d3,dlO) k VH£F£K(d4,dlO) k VR£F£n(d5,dlO) 

kJAi \fs ~'attendjwork(Ed, Today) ; \fi attendjwork(Ed, Today) 
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as formal guarantees. 

Formal Definitions: Prioritized 
Circumscription 

We define our notation for axioms from section 2 as a 
meta-language (the Circumscriptive Language of De- 
faults, or CLD for short) that, at any point in the 
update sequence, specifies a prioritized a default ” cir- 
cumscription of the form: 

PDC(B\ D; R; fix W; Z) 1 = 

B[Z] A -3 Z'. B[Z '} A Z< (D . R) Z' = 

Here, B is the conjunction of the sentence parts of all 
of the for-sure axioms. D is the tuple of the default 
axioms’ formula parts. R is a strict partial order of 
precedence (priority). It is the transitive closure of 
the precedence relation specified by the pairwise com- 
parisons in the VR£F£R axioms. Its domain, accord- 
ingly, is the set of default axiom labels. Z is the tuple of 
all mentioned predicate symbols; e.g., in the example, 
(bat, dog, mammal, 2legs,4legs, fire, .. .). W C Z is 
the tuple of predicates that are fixed. Fixing is a stan- 
dard notion in the circumscription and non-monotonic 
reasoning literature. Fixing is part of the specificar 
tion of non-monotonic reasoning. Intuitively, fixing 
some symbols implies that any formula that mentions 
only those symbols is immune to the circumscription 
operation in the sense that it can be concluded non- 
monotonically, i.e., from the circumscription, only if it 
can be concluded “monotonically” , i.e., from the for- 
sure axioms B alone. For simplicity, we also fix (do 
not vary and second-order quantify over) all function 
symbols. This assumption can easily be relaxed. This 
assumption is typical in the circumscription literature. 
Uniqueness of Names, plus Domain Closure, implies 
that functions are effectively fixed, for example. For 
the sake of simplicity, in this paper, we for the most 
part do not consider fixing of predicates, only of func- 
tions: W is empty. We omit further details about fix- 
ing to save space and to preserve focus; see [Grosof, 
1992b] for more. 

Prioritized default circumscription is a slight gen- 
eralization of prioritized predicate circumscription cf. 
[Grosof, 1991]. We employ it and CLD to clarify the 
definitions of axiom sets and of updating, and the in- 
tuitive relationship to other formalisms for default rea- 
soning. [Grosof, 1992b] shows as a theorem the equiv- 
alence of any prioritized default circumscription to a 
corresponding, abnormality-style, prioritized predicate 
circumscription, generalizing a previous result that ap- 
peared in iLifschitz, 1984]. Note that our definition cam 
express minimizing predicates as a special case: e.g., 
:> a6i(x), where a6» is am abnormality predicate. 

We let N stamd for the index tuple of D: it is just 
(isomorphic to) the tuple of the labels of the default 
axioms. I.e., in the example, after the second update, 
the elements of D[Z] are: 

Ax. 6at(x) D legs2(x) , 


Ax. mammal(x) D 4 legs(x) 

And N - (dl, d2). R(j, i) means that the default with 
label j has strictly higher priority than the default 
with label i. -<(d,r ) is defined as the strict version 
A -^>(D,R)) of the prioritized “formula” pre- 

order 


Z^{D y R) z ' = 

Vi E N. [V; G N. R(j,i)D 

(Vx.Dj[Z,x] = Dj[Z',x])] 

D (Vx.Di[Z,x\D Di{Z',x]) 

Here Dj and Di refer to the j th and i** members, 
respectively, of the tuple D. 8 We define the corre- 
sponding circumscriptive prioritized default theory as 
the set of all conclusions entailed (model-theoretically, 
in second-order logic) by the prioritized default cir- 
cumscription. 9 19 We define a prioritized database 
PDB) to be a pair, consisting of 
m the example, the current colh 
cVj’s)); and an associated prion 
DB, which is some subset of the p 
cumscriptive theory C(A) specifi. 
the non-monotonic theory operator for the CLD for- 
malism. 


i CLD axiom set A 
• ion of the updates 
•zed database theory 
oritized default cir- 
d by A . Here, C is 


Decomposition: Concepts 

As part of our strategy, we need to develop a strong 
idea of a part of a non-monotonic theory. This is im- 
portant for several reasons: 1) to define safe versus 
unsafe zones for belief revision; 2) to define relevant 
versus irrelevant context for inference (and for specifi- 
cation); and 3) to define the structure and organization 
of an overall (“global”) prioritized database. In clas- 
sical logic, we take for granted such an idea of a part 
of theory. However, the dependence of entailment on, 
in general, the entire global axiom set means that we 
have to “work for it” in NM logical systems. 

Our general concept of decomposition is applicable 
to many NM logical systems. A global theory T can 
be obtained either directly by applying the NM the- 
ory operator C to the global axiom set A, or indirectly 
(but equivalently) via decomposition. In decomposi- 
tion, the global axiom set A is decomposed into an 
associated set of “constituent” axiom sets (the 5-4* ’s). 
The global theory T is then equivalent to the combi- 
nation of the corresponding sub-theories (the «S3Ys), 
where each sub-theory is the result of applying C to a 

constituent axiom set: STi = f C(SAi). 

8 For notational simplicity, we ignore the potentially dif- 
ferent arities of the various open formulas Di. 

9 See [Grosof, 199l] and [Grosof, 1992b] for more discus- 
sion of how prioritized circumscriptions are defined. Note 
that the prioritization p.o. R is not necessarily layered 

! stratified) (indeed, in our example, it is not) as it was in 
Lifschitz, 1985]. 

10 In section 5, we generalize the definition above to in- 
clude the explicit “fixing” of a set of formulas, e.g., a subset 
of the predicates. [Grosof, 1992b] gives details. 
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theory 


sub- 

theories 



NM 

theory 

operator 

Figure 1: Conjunctive Decomposition: a conceptual flow diagram. A global theory T can be obtained either directly 
by applying the NM theory operator C to the global axiom set A , or indirectly (but equivalently) via decomposition. 
In decomposition, the global axiom set A is decomposed into an associated set of constituent axiom sets (the 5>t< , s). 
The global theory T is then equivalent to the conjunctive combination of the corresponding sub- theories (the «S7Y s), 

where each sub-theory is the result of applying C to a constituent axiom set: STi == C(5>ti). 


In CLD, we define T to be the result of conjunc- 
tive combination when T is C'n((J*- 1| ... tn 5Tt); where 
Cn is the monoionic consequence (theory) operator in 
classical logic. When the corresponding axiom sets are 
understood, we will say that the global theory is con- 
junctively decomposable into these slice sub-theories, 
u 

In terms of the circumscriptions, we have: 

VVC(A) = /\ WC(SAi) 

Again, when the corresponding axiom sets are un- 
derstood, we will also speak of a circumscription be- 
ing conjunctively decomposable into slice circumscrip- 
tions, e.g., for n = 2: 

PDC(B;D;R\Z) = 

PDC(SBl]SDl;SRl;Z) A 
PDC(SB2;SD2;SR2;Z) 

Figure 1 illustrates conjunctive decomposition with 
a flow diagram. 

Conjunctive decomposition is thus a kind of refor- 
mulation or representation change. The global axiom 

11 Serial combination has the flavor of a cascade: there 
is a series of phases of adding axioms and drawing conclu- 
sions, where the previous stage’s conclusions are treated 
as for-sure. Many NM inference procedures can be de- 
scribed in this manner. Details about serial decomposition 
are omitted due to considerations of space and focus. See 
[Grosof, 1992b] for more. 


set and theory (A,T) are transformed into a collec- 
tion of constituent axiom sets and slice sub-theories: 
({SAi,STi),...,(SAn,ST n ))- 

Most Subsets Do Not Qualify As Con- 
stituents for Decomposition: Note that, in gen- 
eral, in non-monotonic reasoning, one cannot blithely 
partition a global axiom set into a bunch of (distinct, 
or, more generally, overlapping) subsets (whose union 
is the global axiom set) any old way and get a conjunc- 
tive decomposition. This is because the axioms in one 
subset may conflict with those in another. 

E.g., consider the classic Quaker- Republican exam- 
ple of conflict in default reasoning: there are two de- 
fault axioms, one saying that Quakers are typically 
Pacifists, and another saying that Republicans are typ- 
ically non-Pacifists. In addition, there are two for-sure 
axioms: that Nixon is a Quaker, and that he is a Re- 
publican. Suppose we consider two subsets: one con- 
taining the Quaker axioms, and another containing the 
Republican axioms. Treating a subset as a constituent 
axiom set means drawing non-monotonic conclusions 
from it as if there were no other axioms around. Doing 
so, from the first (with Quaker) one gets the default 
conclusion that Nixon is a Pacifist; from the second, 
one gets the default conclusion that Nixon is a non- 
Pacifist. Taking the conjunction of these two “sub- 
theories” thus results in garbage: inconsistency. Yet 
the actual global theory is consistent: neither conclu- 
sion about Pacifism is sanctioned. Figure 2 illustrates. 
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•> Republican(Nixon) 


♦> Quaker(Nixon) 

:> Republican(x) D ->Pacifi8t(x) 


:> Quaker(x) D Pacifist(x) 



Figure 2: Non-modularity: Quakers and Republicans. (Default axiom labels not shown.) 


Our per- 

spective is that, in general, non-monotonicity means 
a kind of logical non*modularity: when attempting 
to draw conclusions from a subset of the global axiom 
set, one must keep in mind the context of the remainder 
of the global axiom set. If one considers that remain- 
der as an “internal” update, then that update may be 
non-monotonic. Another way to view this situation 
is that non-monotonicity means logical globality: in 
general, a non-monotonic conclusion cannot be drawn 
until the entirety of the global axiom set is considered. 


Locality: 

Suppose we can find a conjunctive decomposition in 
which for some i, the slice’s axiom set is a subset of the 
global, i.e., 5-4, C A. In this case, we say that the slice 
is a clean slice. Then we know that all the remaining 
axioms (.4 -5-4,*) in the global axiom set are irrelevant 
context , in an important sense, relative to the slice’s 
axiom set 5-4, . In this case, one can soundly, and in 
an important sense completely, perform inference lo- 
cally: considering only the axioms in SAi, and using 
whatever standard procedures are available generally 
for the NM formalism. This is sound, because C(5.4i) 
is then a subset of the global theory. This is com- 
plete, in a sense, because the contribution of 5-4,- to 
the global consequences requires only monotonic infer- 


ence beyond its own local (NM) consequences C(5-4,). 
By “irrelevant” above, then, we mean that one does 
not need to consider the remainder of the global ax- 
ioms in order to do the essential non-monotonic aspect 
of the reasoning from 5-4,-. 

In the rest of this paper, we will be only consider- 
ing decompositions that are clean. ([Grosof, 1992b], 
however, discusses the usefulness of decompositions 
that are not clean, e.g., decompositions on the basis 
of higher versus lower priority.) 

Observe that in clean slicing, the constituent ax- 
ioms sets are each smaller, and thus simpler, than the 
globed axiom set. In prioritized default circumscrip- 
tion, and in other expressively rich NM formalisms, the 
computational complexity of non-monotonic reasoning 
(including, full forward inference and belief revision) 
is worse than monotonic reasoning. Non-monotonic 
reasoning (full forward inference and belief revision) 
in each slice, and via monotonic conjunctive combi- 
nation, is thus computationally less complex than 
non-monotonic reasoning in the global theory. 

Partitioning Axioms As Kind of Reformulation: 

Our perspective, therefore, is that, in non-monotonic 
reasoning, decomposing, e.g., partitioning (see Theo- 
rems 1 and 12), a global axiom set into constituent 
axiom sets is a quite non- trivial kind of reformulation. 
This is very different from the situation in classical 
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monotonic reasoning. 

Safeties of Updating: 

Suppose, in a conjunctive decomposition, that 
are present both before and after an 
update U , I.e M suppose that some of the constituent 
axiom sets in a decomposition after an update U are 
unchanged from (i.e. t are the same as) in a decom- 
positon before that update. Then we know that all of 
the conclusions in the conjunctive combination of their 
associated slices are safe under the update. 

Hierarchy: 

We can view the conjunctive combination of a set of 
slice sub-theories as being, in turn, a sub-theory. When 
those slices are clean, then this sub-theory is itself well- 
defined as a clean slice: its axiom set is simply the 
union of those slices’. Thus we can often choose grain 
size hierarchically during conjunctive decomposition. 

Sequencing of Inference: See section 1 about con- 
currency. 

Disjoint Predicates 

Our results will all make use of the following idea of 
decomposing the specified prioritization. 

Composing Prioritization: 

The concept of prioritization over groups of defaults is 
natural in the specification process for many applica- 
tions: often a group of defaults corresponds to a topic. 
[Grosof, 1991] introduced, and [Grosof, 1992b] elabo- 
rates, this idea of “ composing * prioritization, in which 
an overall prioritization p.o. R over the domain of in- 
dividual defaults is equivalent to the result of compos- 
ing an external prioritization p.o. RE, defined over 
groups, with a tuple RI of prioritization p.o.’s, one 
( Rli ) per group, that each represent the prioritization 
internal to that group: R = RE*R1. Groups may, 
in turn, be composed of groups. Thus we may define 
prioritizations of prioritizations, in hierarchical or re- 
cursive fashion. Our example displays this structure. 

Our first result is about decomposition on the basis 
of syntactic disjointness of predicates. It captures a 
basic case of the intuition that syntactically “having 
nothing to do with each other” should imply strong 
irrelevance of the kind we discussed in the last section. 

Theorem 1 

(Clean Decomposition, given Disjoint Predi- 
cates) 

Let PDC(B;D;R;Z) be a global PDC. 
Let {B1[Z1],... , Bk[Zk]} be a partition of the base 
axioms B[Z], and let {D1[Z1], . . Dk[Zk]} be a par- 
tition of the default formulas D[Z], where the predicate 
tuples Z 1, . . . , Zi b are a (disjoint) partition of Z. I.e., 
in terms of CLD, let there be a partition, of the base 
and default axioms, where the predicates mentioned in 
each element of the partition are disjoint. If a certain 
condition (0) (see below) on the prioritization R is sat- 
isfied, then 


PDC{B',D\R)Z) = 

A*=i PDC{Bj\ Dj; RIj; Z) 

(Note that the Z on the right-hand side can be equiv- 
alently replaced by Zj .) Condition (0) is defined as: 
either, R is the composition of some prioritization RE 
with the tuple RI of the internal prioritizations of each 
partition; or, R is layered (stratified). The composition 
condition for non-layered R corresponds, intuitively, to 
a kind of a partitionability of the prioritization. Note 
the special case of empty R satisfies (0). 

Proof Overview: Surprisingly non-trivial. The 

essence is to use the ability to separate existential 
quantifiers in the right-hand-side part of the circum- 
scription formula (cf. section 3). Non-layered prioriti- 
zation makes this tricky: hence the prioritization con- 
ditions in the theorem. □ 

In terms of CLD, Theorem 1 tells us that syntac- 
tic disjointness implies irrelevance in the sense that 
we discussed in the last section; the decomposition by 
syntactic partition is a clean slicing. 

Theorem 1 immediately yields a powerful result 
about inference. 

Theorem 2 

(Locality of Inference, given Disjoint Predi- 
cates) 

In Theorem 1, each slice j is sound and complete, 
relative to the global theory, for inference over its 
corresponding sub-language (partition of the predi- 
cates). That sub-language consists of the formulas that 
mention only the predicates Zj. This locality holds 
both for forward inference, and for backward inference 
(query-answering). Note that to perform inference us- 
ing any subset Y of the predicates Z, one need only 
work in the conjunctive combination of those slices 
whose predicates cover that subset Y. 

Theorem 1 also immediately yields a powerful result 
about belief revision. 

Theorem 3 

(Safety of Updating, given Disjoint Sub- 
Languages) 

In CLD, let the previous axiom set be partitionable 
according to Theorem 1. Let an update U consist of 
base, default, and prioritization axioms, such that the 
formula parts of the base and default axioms mention 
only predicates from a (possibly empty) subset of the 
previous partitions, and such that the global prioritiza- 
tion condition (0) is still met. Then all of the previous 
conclusions derived solely from the rest of the parti- 
tions’ slices do not require retraction. 

Application to Main Example: 

The above theorems capture the first intuition that we 
discussed in section 2. At each point in the sequence of 
updates, Theorem 2 implies that inference can be lo- 
calized: inferences about legged-ness can be performed 
in the slice that contains only the axioms about legged- 
ness, and likewise for meetings. Figure 3 illustrates the 
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original representation 
global 

U1 - U8 


conjunctive decomposition I disjoint predicates 


legged-ness 
Ul, U2, U4 


meetings 
U3, U5 -- U8 


Figure 3: Conjunctive Decomposition using Disjoint Predicates: In o u** main motivating example (section 2), we 
conjunctively decompose the global axiom set (after the last update U ) into two slices by employing the disjoint 
predicates result (Theorem 1): one slice about legged-ness, and the othor slice about meetings. In the bottom half, 
each inner box stands for a constituent axiom set. 


conjunctive decomposition cf. Theorem 1 after the last 
update. Theorem 3 guarantees that after each meet- 
ings update, all of the previous conclusions drawn from 
the legged-ness slice are safe, and vice versa. 

Generalizations: 

Theorems 1, 2, and 3 generalize in several directions. 
Firstly, predicate (and function) symbols may overlap 
between the constituent axiom sets as long as they 
are fixed in the circumscription (see earlier discussion 
about fixing in section 3). Intuitively, it is OK to spec- 
ify some predicate (and function) symbols as fixed if 
it is OK not to infer any default conclusions express- 
ible purely in terms of those symbols. Secondly, the 
prioritization condition can be relaxed somewhat. 

Definitional Reformulation of Axioms: 

Thirdly, and mos*. interestingly (see discussion toward 
end of section 1 about source of power), one can de- 
compose with irrelevance (slice cleanly) as long as one 
can definitionally reformulate the global axiom set to 
meet the par titionability condition. (See Theorem 12.) 
One interesting such case is reasoning about one indi- 
vidual object, e.g., Joe in our example, at a time. (See 
Theorem 16.) Often (e.g., for the legged-ness axioms 
in our example), such re-formulability is easily (time 
polynomial in the number of axioms) detectable syn- 
tactically. We pursue all this in the next two sections. 

Basic Definitional Reformulation of 
Axioms, One-by-One 

Next, we define a particular kind of definitional refor- 
mulation. This kind of reformulation maps each for- 
mula in one formulation into a correspondent formula 
in another formulation, while preserving equivalence, 
i.e., without loss of information. Our motivation for 


considering this limited kind of reformulation is our 
intended application: to disjoint describability and its 
asocial-monadic special case. Why do we do the re- 
formulation “one axiom at a time”, i.e., one-iy-one? 
Much of the reason is that there is an important dif- 
ference between default / NM reasoning and monotonic 
reasoning. 

We take for granted in monotonic logics that a col- 
lection of for-sure (base) axioms 2?l,...,2?m can be 
equivalently replaced by the axiom B 1 A ... A Bm. 
In prioritized default circumscription and most other 
expressively rich NM formalisms, however, one can- 
not, in general, equivalently replace the pair of defaur 
axioms (whose default formulas are) D 1 and Dv 
the default axiom (whose default formula is the 
junction) D 1 A D2 (even in the case without priorr - 
i.e., when the prioritization is empty). Informational 
“grain size” of the defaults is important: having the 
two separate defaults means that, for example, D2 
may “succeed” (i.e., be concluded non-monotonically 
from the defaults) even if Dl is “defeated” (e.g., is vi- 
olated by the for-sure information), unlike if the only 
default present is Dl A D2. We will need equivalence- 
preserving (and information-preserving) reformulation 
in order to apply the decomposition on the reformu- 
lated representation back onto the original representa- 
tion. 

Circumscription is defined in terms of second-order 
logic. We thus find it convenient and natural to define 
the kind of definitional reformulation we will need in 
terms of second-order logic, as well. We build on t::e 
standard idea of a conservative extension, drawn fror~- 
the classical logical literature. In this and the next 
tion, we then develop several, increasingly complex 
tions of definitional reformulations, in order to handle 
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the grouping structure in various stages of our refor- 
mulations: groups of predicates, groups of individuals, 
groups of formulas. 

In this paper, we mainly address reformulations ori- 
ented around disjointness of mentioned (predicate and 
function) symbols. It is thus convenient to define our 
changes of representation in terms of changes in the 
symbols mentioned. 

First Cut at Definitional Reformulation: 

What does it mean to definitionally reformulate the- 
ories (or formulas) while preserving equivalence? At 
first glance, it simply means to introduce some defini- 
tions (of new symbols) which logically imply (entail) 
the equivalence of a theory expressed in an original set 
of symbols (i.e., an original representation) to a new 
theory expressed in those new symbols (i.e., a new rep- 
resentation). E.g., let A1[P] be the original theory, let 
U[P, Q] be some definitions of new symbols Q in terms 
of the old symbols P, e.g., a conjunction of explicit 
definitions: 

U[p, Q] = 

(Q1 = E\[P]) A ... A (Qm = Em[P]) 

(where m is the length of the tuple Q ), and let A2[Q] be 
a new theory that is equivalent to Al[P] given U[P , Q]: 
U[P,Q] |= A1[P] = A2[Q] 

More generally, we can permit the new representar 
tion to use some of the old symbols; let W be the 
overlap symbols between the old and the new. Sup- 
pose Al[W t Y] is the original theory, A2[W, Y, Z] is the 
new theory, and U[W, Y, Z] is the (conjunction of) def- 
initions of new symbols Z in terms of W and Y, e.g., 

U[W,Y,Z] = f 

(Z 1 = E\[W, Y]) A ... A (Zm = Em[W , Y]) 
(where m is the length of the tuple Z)\ and suppose 
U[W,Y,Z] (= Al[W,Y] == A2[W,Z] 

Then we call U a “putative” definitional reformulator . 


Observation 4 

(Subtlety: Uninformativeness and Consistency) 
However, there is a subtlety. To us, part of the intu- 
ition behind the idea of an definitional reformulation 
is that the equivalence is non-spurious, i.e., that the 
definitions themselves are not introducing information. 
Unfortunately, merely requiring U to be a conjunction 
of explicit definitions allows spuriosity and informa- 
tiveness. 

Consider the following example. Let W be empty. 

Let Y = f (Y1,Y2), where Y1 and Y2 are 0-ary 
predicates. 12 Let A\[W, Y] be defined as Y1 A -»Y1. 
Let the definitions U be (Zl = Yl) A (Z2 = -iYl), 
where Z 1 and Z2 are 0-ary predicates. Let 

Z = f (Zl,Z2), and let A2[W, Z] be defined as 

12 We do not use Y2 immediately, but we will use it later 
when we continue this example in the discussion after Def- 
inition 7. 


Z 1 A Z2. Then U implies that Al is equivalent to 
A2. Yet this contravenes our intuition of a reasonable 
definitional reformulation. Al is inconsistent, i.e., is 
equivalent to False . A2, by contrast, is consistent. 

Viewing the direction of reformulation from A2 
to Al f in effect U is introducing some information, 
namely that Z 1 = -«Z 2. The source of this problem is 
that, even though U is a conjunction of explicit defini- 
tions, U is itself not always consistent when it is viewed 
in this “return direction” of the reformulation (i.e., 
from A2 to Al). Yet, to us, any notion of equivalence- 
preserving definitional reformulation ought to be sym- 
metric, i.e., kosher in both directions: from Al to A2 
and from A2 to Al. We would, therefore, like to im- 
pose some kind of additional constraint on U to guar- 
antee intuitive uninformativeness and non-spuriosity of 
the equivalence between the two representations. Be- 
low, we do this by formalizing IPs consistency and its 
relationship to directionality more precisely. 

The idea of a conservative extension, standard in 
the classical logical literature, provides a nice notion 
of uninformativeness in terms of mentioned symbols. 

Definition 5 (Conservative Extension) 

Let A1[P] be a formula 13 mentioning only (the tuple 
of symbols) P. Let Q be (a tuple of symbols) distinct 
from P. Let A2[P, Q\ be a formula mentioning only 
P U Q. Then we say that A2[P, Q ] is a conservative 
extension of A1[P] when: 

VP.[(3Q.A2[P,Q]) = A1[P]J 
or, equivalently, when both: 

VP,Q.A2[P,Q] D A1[P] 

VP.[X1[P] 3 (3Q.X2[P,Q])1 

Another way to view the idea of conservatism in this 
definition is that A2 "says” exactly as much about P 
as Al does. A2 in addition says stuff about Q. I.e., for 
any formula G[P ] mentioning only P: 

A3[P, Q) |= G{P] <=> Al[P] |= G[P\ 

Suppose that 

A2[P,Q] = f Al[P] A U[P,Q] 

Then we say that l/[P, Q] is a conservatively extending 
update to A1[P]. 

Intuitively, we can thus view a conservatively ex- 
tending update U[P , Q] as uninformative in a precise 
sense, namely about the old symbols P. 

Notation: 

Let D < E stand for the universally quantified im- 
plication Vr. D(x) D E(x), where D and E are open 
formulas with the same arity of free variables (i.e., 
are similar), and x stands for a tuple of free indi- 
vidual (object) variables. Let D=E then be defined 
analogously as the universally quantified equivalence 
Vx. D(x) = E(x). We also apply this notation to tu- 
pies D = {D\ D m ) and E = (Ei,...,E m ): e.g., 

13 in (higher-order) classical logic 
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D=E stands for 

D\—Ei A ... A D m =E m . 

Fact 6 

(Explicit Definitions Are Conservative) 
(Conjunctions of) explicit definitions of new symbols 
(e.g., predicates) are always conservatively extending 
updates. I.e., in Definition 5, suppose U[P, Q\ is a 
conjunction of explicit definitions of each symbol in Q : 

Q = E[P] 

(Here, we are using the tuple = notation introduced 
above, and applying it also to functions and terms.) 
Then U[P \ (?] is a conservatively extending update, for 
any A1[P]. 

Conservative Extension, Uninformativeness, 
and Directionality: 

Equipped with the idea of a conservative extension, 
we are now ready to return to the question of refin- 
ing the basic idea of definitional reformulation. In our 
“first cut” above, we found a need to formalize the con- 
straint that the putative definitional reformulator U be 
uninformative, in both directions of the reformulation. 
In Definition 5, we observed that the property that a 
“definitional” reformulator U[P } Q) is a conservatively 
extending update precisely expresseses U's uninforma- 
tiveness, in the direction of Al to A2 , i.e., about P. 
There, however, U is not really quite a reformulator 
in the sense we discussed in the “first cut”, since A2 
mentions not just the new symbols Q , but also the old 
symbols P. However, we can extract the notion of un- 
informativeness present there, i.e., the “conservatism” 
in the idea of a conservative extension. 

The property that U is a conservatively extending 
update is: Al[P\ ^ 3 Q.U[P t Q] 

which we can also write as: 

|= (VP.A1[P] 0 3Q. U[P,Q]) 

One can view the right-hand-side as a satisfiability 
(i.e., consistency) property. This satifiability / con- 
sistency is conditional on Al. 

We take this conservativeness property as the basis 
for uninformativeness of a (putative) definitional refor- 
mulator t/. However, we need the “return direction” 
uninformativeness as well: 

A2[Q] f= 3 P.U[P,Q) 
which we can also write as: 

N (VQ.A2M] D 3P.U[P,Q]) 


Definition 7 

(Definitional Reformulator — Basic Case) 

We say that U[W,Y,Z] is a definitional reformula- 
tor (basic case) between two formulas A\[W,Y] and 
A2[W,Z] (where W, Y, and Z are distinct tuples of 
symbols) when: 

1. U implies the equivalence of Al and A2: 

1= U[W,Y,Z\ D (Al[W,Y] = A2[W,Z]) 

2. U is uninformative, i.e., conservative, in both direc- 
tions of the reformulation, i.e., with respect to Al 


and with respect to A2: 

N (VW,YA1[W,Y] D 3Z.U[W,Y,Z\) 
t= (VW,Z'A2[W,Z] D 3 Y.U[W,Y,Z]) 

Discussion; Directionality: 

Having the second direction, in addition to the first 
direction, of the conservativeness property in Defini- 
tion 7 rules out the nastily-behaved example that we 
discussed in Observation 4. However, the conservative- 
ness property in Definition 7 reassuringly does permit, 
for example, the following, more intuitively reason- 
able basic-case definitional reformulator: 

U[W,Y,Z] s (Zl = VI) A (Z2 = -<Y2) 
(where the symbols are as in the example discussed in 
Observation 4) for any Al, A2. 

The property that U consists exclusively of (a con- 
junction of) explicit definitions ensures, in general, 
only one direction of conservativeness. 

Conditionality Versus Unconditionality of Con- 
servativeness: 

Definition 7 is perhaps too “custom” in one regard, 
however. The conservativeness property is conditional: 
it depends on the particular Al and A2. This is per- 
haps unsatisfactory intuitively, at least for some pur- 
poses, as a notion of “definitional” in “definitional re- 
formulator” . 

Alternative Definition of Conservativeness: Un- 
conditional Version: 

As an alternative definition of the basic case of defi- 
nitional reformulator, we observe that one can use a 
stronger (i.e., more strongly constrained, special case) 
notion of conservativeness instead: 

|= VW,Y 3Z. U[W,Y,Z 
(= VW,Z.3Y. U[W f Y,Z 

to replace the conservativeness property (2.) in Defi- 
nition 7. This “unconditional" version of the conser- 
vativness property does not depend on Al and A2: 
i.e., it implies that the a conditional ” conservativeness 
property (2.) in Definition 7 holds for any Al and A2. 

Alternative Definition of Conservativeness: 
Backgrounded Version: 

As an intermediate position between the conditional 
and unconditional versions of the conservativeness 
property, we observe that one can formulate condi- 
tionality in a somewhat abstracted fashion: in terms 
of the symbols W that are in common between the two 
representations. We will find it convenient for our later 
definitions to employ a notion of a background G[W] 
to the reformulation. One can view G[W ] as, in effect, 
included in both Al[W, Y] and A2[W t Z]. We then de- 
fine the * backgrounded * version of the conservativeness 
property as: 

G[W] 1= VY. 3Z. U\}\ Z] 

G[W) |= VZ. 3Y. U[W, i , Z] 
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In the remainder of this paper, we will use this last, 
“backgrounded” version of the conservativeness prop- 
erty. We do so in order to formally simplify our later 
definitions of more complex kinds of definitional re- 
formulators and reformulations, which are oriented to- 
wards particular uses. However, the “conditional” ver- 
sion of the conservativeness property is more funda- 
mental and general, we believe, and is interesting to 
explore: we plan to do so in the future. 

No Requirement of Explicitness: 

Note that in Definition 7, we did not require U to be 
in the form of a conjunction of explicit definitions of 
new symbols in terms of old symbols. We formalized 
/ summarized the “ definitional flavor of the reformu- 
lator as, simply, Us conservativeness. Our definition 
of definitional reformulator thus allows U to consist of 
implicit definitions (e.g., with recursion) and partial 
definitions (i.e., necessary and sufficient conditions). 
(Later, in our result about the asocial monadic special 
case of disjoint describability (Theorem 16), the refor- 
mulator will consist exclusively of explicit definitions, 
however.) 

Next, we define a definitional reformulation of a 
group of formulas, using a single common reformula- 
tor: one-by-one, into a new group of formulas. For this 
purpose, it is convenient to be able to abstract away 
from conditionalizing conservativeness on each of those 
formulas: we thus use the backgrounded version of con- 
serVativeness. 

Definition 8 (Group Reformulator) 

Let ET1[W,Y) and ET2[W,Z] each be a similar 14 
tuple of formulas; these formulas may be open 
or closed. We call each tuple a group. Let 
U[W \ y, Z ] and G[W] be closed formulas. We say that 
U[W, Y y Z ] is a group reformulator between ETl[W,Y] 
and ET2[W,Z], given the background G[W] when: 

1. U is conservative (given the background) with re- 
spect to y and also with respect to Z: 

G[w\ vy. 1Z.U[W,Y,Z 

G[w] vz. 3 y. u[w, y, z 

2. U reformulates each formula in either group into the 
corresponding formula in the other group. I.e., U im- 
plies the equivalence of corresponding member for- 
mulas (subscripted by ,;) in the two groups: 

U[W,Y,Z] AG[W] £= 

Vj. ETlj[W, Y) = ET2j[W, Z] 

Disjoint Describability and Disjoint 
Individuals 

Next, we show how to use definitional reformulation 
to generalize the disjoint predicate special case: to the 
more general case of disjoint describability. More pre- 
cisely, we use definitional reformulation to transform 

14 Terminology: By “similar”, we mean of same length, 
and with same arities for their members. 


a disjointly describable global axiom set into a rep- 
resentation that has disjoint predicates, and then to 
transform back again after decomposition. Figure 4 
illustrates. We show that the disjoint describability 
case, like disjoint predicate case, has a clean, parti- 
tioning conjunctive decomposition, which, moreover, 
implies interesting localities of inference and safeties 
of updating. We then identify an interesting special 
case of disjoint describability (asocial-monadic) that, 
like the disjoint predicate case, is easily recognizable 
in terms of the syntax of the starting global axiom set. 
We begin with some preliminaries. 

Definition 9 (Syndicate Reformulator) 

We define a syndicate reformulator as a tuple of group 
reformulators that obeys an extra syndication prop- 
erty: their conjunction is also conservative. 

More precisely: Let ETTl[W,Y] and ETT2[W, Z] 
each be a similar tuple of tuples of formulas; these for- 
mulas may be open or closed. Each element of the top 
level of tupling is itself a tuple of formulas cf. Defini- 
tion 8. The top level tuple is thus a syndicate whose 
elements are groups of formulas. 

Let UT[W, y, Z] be a tuple of closed formulas, of the 
same length as the top level tuples above. I.e., let it 
consist of one formula per group. Let G[W) be a closed 
(background) formula, as in Definition 8. 

Below, we use i to subscript groups, and j to sub- 
script formulas within groups. 

We say that UT[W , Y , Z] is a syndicate reformula- 
tor between ETT1[W,Y] and ETT2[W,Z), given the 
background G[W] when: 

1. For each group i , UTi is a group reformulator be- 
tween ETTi and ETT2i (given the background): 

Vi. UTi[W,Y,Z] AG[W] |= 

Vj. ETTlij[W, Y] = ETT2ij[W,Z] 

2. the conjunction UC *2 f\{UTi is conservative 
(given the background) with respect to Y and also 
with respect to Z\ 

G[w] h vy. 3 z. uc[w,y ; z] 

G[W] |= VZ.3y. UC[W,Y,Z] 

The reason we call the above a syndicate reformulation 
is the linkage between the different groups imposed by 
the conjunction’s (UC y s) conservative extension prop- 
erty. This implies, but is not implied by, the conjunc- 
tion of the conservative extension properties for each 
group’s reformulator UTi. 

Definition 10 

(Partitioning Syndicate Reformulator) 

We say that a syndicate reformulator cf. Definition 9 
is Z •partitioning when: 

Vi.UTi[W,Y,Z] is Ui[W,Y,Zi ] 

V«, j. ETT2ij[W, Z] = ETT2ij[W, Zi) 
where Vj / k. ZjOZk = 0, i.e., the appearances of the 
symbols Z are partitioned by group. 
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Figure 4: Disjoint Describability: a r ow diagram of the reformulation steps involved. 


Definition 11 (Disjoint Describability) 

Suppose that UT is a Z-partitioning syndicate refor- 
mulator as in Definition 10, where for each group i, 
ETTli is defined as the concatenation of a (closed, 
base) formula Bli with a tuple of (open, default) for- 
mulas Dli, and similarly, ETT2i is the concatenation 
of B2i and D2i. 

Let B 1 stand for the conjunction of the Bit’s; and 
D 1 stand for the concatenation of the DIVs. Let B2 
and D2 be defined similarly. 

Suppose also that PDC(B2\D2\R\ fix W\W,Z) 
fulfills the conditions in Theorem 1 (disjoint predi- 
cates), where the grouping, and the partition there on 
Z , is the same as in UT. 

Then we say that PDC(B\] Dl; R; fix W;W,Y) 
is dis jointly describable under (definitional) reformu- 
lat; ; by UT[W \ Y , Z], given G[W]. 

Theorem 12 

(Clean Decomposition, given Disjoint Describ- 
ability) 

If a PDC is disjointly describable, then it is cleanly 
conjunctively decomposable into slices corresponding 
to the partitioning grouping employed in the reformu- 
lation. I.e., then the grouping employed in the refor- 
mulation forms the basis for a clean slicing. 

More precisely: Suppose 

PDC(Bl; Dl; R\ fix W\W y Y) is disjointly describe 
able under (definitional) reformulation by UT[W, Y, Z] t 
given G[W], as in Definition 11. Then 
PDC(Bl;Dl;R;fixW;W,Y) = 

A f * PDC(BU] DU] Ri ; fix W; W, Zi) 

where Ri d = R Nt is the internal prioritization of the 
group of defaults DU, whose index set (tuple) is Ni. 
(Equivalently, the Zi on the right hand side could be 
replaced by Z.) 

Proof Overview: Theorem 1 plus some lemmas 

about definitional reformulation of circumscriptions. 


Figure 4 illustrates the logical flow of the proof. □ 

Theorem 12 immediately yields results about local- 
ity of inference, using Theorem 2, and about safety 
of updating, using Theorem 3. 

Next, we consider a special case of disjoint describ- 
ability: asocial-monadic. 

Theorem 13 

(Fixed Cases Reformulation of Defaults) 

In PDC, defaults can be reformulated by relativizing 
them to fixed (-formula) cases. 

More precisely: In a PDC(B; D N ; R; fix W;Z), 
suppose that 

Vi € N. B[Z] f= Vxi. Vj=i Fij[Z% xi] 
where xi is a (possibly empty) tuple of individual (ob- 
ject) variables, and where, for each *,j, the (possibly) 
open (elementary) formula Fij[Z,xi] is fixed relative 
to the circumscription (e.g., it mentions only function 
symbols; remember all functions are fixed). For each 
default index i, we call each Fij a fixed case. Suppose 
also that 
B[Z) f= 

Vi, j. Vxi. Eij[Z , xi] = ( Fij[Z , xi] D Di[Z , xi]) 
I.e., suppose that each Eij is equivalent to the default 
Di relativized to the fixed case Fij . Then 

PDC(B;D]R]fix W\Z) = 

PDC(B]E;RR]fix W\Z) 

where the tuple E stands for the concatenation of all 
the Eij 1 s, and where RR is defined as the composi- 
tion of R (as external prioritization) with a tuple 0T 
of empty prioritization p.o.’s. Each of 0T’s elements 
is an empty prioritization p.o. 0Ti that is of size rrr 
and corresponds to (i.e., has as domain) the index se 
of the (sub-) tuple Ei. 

Proof Overview: The key is that each original 

default pre-order is equivalently reformulated, in the 


i 


i 
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context of the circumscription’s “augmentation” (i.e., 
second-order-quantified part in its definition cf. section 
3), into a parallel default pre-order corresponding to 
Ei. □ 

Definition 14 (Asocially Monadic) 

We say that a prioritized default circumscription 
PDC(B ; D; B; fix W; Z), or a corresponding CLD ax- 
iom set, is asocially monadic when: 

1. All predicates in Z are monadic, i.e., 1-ary (a.k.a., 
unary). 

2. The base sentence B has the form of a conjunction 
of universal 15 formulas. We will refer to these as the 
base formulas (axioms). 

3. Every default formula (axiom) in D is quantifier-free. 

4. No base sentence (axiom) in B, and no default for- 
mula (axiom) in D, “mixes” individuals. I.e., in their 
clausal forms, no clause contains two literals with 
different arguments. Intuition: different individu- 
als “don’t want to have anything to do with each 
other”, i.e., they are u asocial n . 

5. All terms appearing in the base and default formulas 
are ground, except for primitive variables. 

6. The prioritization R is either layered (e.g., parallel), 
or it is point-modular (see definition below). 

7. All (explicit) fixtures are of predicate symbols (W), 
rather than of arbitrary formulas. (In addition, as 
usual, all function symbols are fixed.) 

8. Uniqueness of Names Axioms (UNA): The base B in- 
cludes axioms enforcing the distinctness of all terms 
that appear in the base and default axioms. 

9. Besides in the UNA, equality does not appear in 
the base or default formulas. (Remember, equality, 
when viewed as a predicate, is binary, not monadic.) 

Definition 15 

(Point-Modular Prioritization) 

Point-modular prioritization generalizes (i.e., the class 
includes) the prioritization that is typical in default 
inheritance networks. By “point” here, we mean an 
individual in the logical language, either named (a 
ground term, e.g. Ed) or unnamed (e.g., referred to 
by a first-order variable, e.g., x in bat(x) 3 2 legs(x)). 
(This idea of a point can be straightforwardly gener- 
alized to a tuple of individuals (e.g., (Boss(4321), d)) 
to handle predicates / formulas with arity more than 
one; but we are only considering here the unary case in 
the context of the asocially monadic case.) By point- 
modular, we mean that the overall prioritization is 
equivalent to the composition of some external priori- 
tization (over the points) composed with a tuple of in- 
ternal prioritizations, one per point. Point-modularity 
results when the prioritization is only specified be- 
tween the same instantiations of different defaults. 

15 Terminology: By universal , we mean without exis- 
tential quantifiers. 


E.g., when the bat default has higher priority than 
the mammal default at each point: (the default axiom 
whose default formula is) bat(Betsy) 3 2 legs(Betsy) 
takes precedence over (the default axiom whose 
default formula is) mammal(Betsy) 3 4 legs(Betsy), 
bat(Joe) 3 2 legs(Joe) 

takes precedence over mammal(Joe) 3 Alegs(Joe), 
bat(Fido) 3 2 legs(Fido) takes prece- 

dence over mammal(Betsy) 3 4 legs(Betsy), etc., but 
there is no precedence between the defaults at differ- 
ent points, e.g., between bat(Betsy) 3 2 legs(Betsy) 
and mammal(Joe) 3 4 legs(Joe). Unfortunately, we 
do not have space to define point-modularity in further 
detail here; it requires discussing “pointwise” prioriti- 
zation somewhat similar to that in [Lifschitz, 1988], 
and generalizing CLD to increase its expressivity with 
respect to prioritization. Note, however, that many 
point-modular prioritizations can be expressed in CLD. 
See [Grosof, 1992a] for more. 

Theorem 16 

(Decomposition by Reformulation, Individual- 
Wise) 

Suppose the PDC(BQ; DO ; BO; fix W; Z) is asocially 
monadic cf. Definition 14. Then the circumscription 
can be cleanly sliced, i.e., conjunctively decomposed, 
into its individual-wise reformulation: 

PDC(B0; DO; BO; fix W; Z) = 

Ajft 1 PDC(Blj; Dlj; Rj ; fix W; Z) 

This individual-wise reformulation is defined as fol- 
lows. 

The basic idea of the reformulation is to divide the 
base and default axioms into groups: one group per 
named individual, plus a catch-all “remainder” group 
for all other, unnamed individuals. Some reformula- 
tion, of a relatively simple kind that is different from 
decomposition and one-by-one definitional reformula- 
tion, is involved in order to break up the quantified 
base axioms and the open defaults into these cases. 
Figure 5 illustrates this logical flow. The details of the 
overall reformulation are, however, a bit involved to 
define; bear with us. 

To begin with, we partition the base and default for- 
mulas according to which arguments appear in them. 

Let J d = {l,...,m} index the set of all ground 
terms aj that appear in the base or default formulas. 

Let BOj stand for the tuple of base formulas that 
mention aj . Each of its members we write as B0jk[Z}. 

Let BOV stand for the tuple of base formulas, other 
than the UNA, that mention a free variable (all of these 
are universally quantified). Each of its members we 
write as Vx. B0Vk[Z y x). Here x is a single (free) indi- 
vidual variable. 

We treat the default formulas similarly to the base. 
Let DOj stand for the tuple of default formulas that 
mention aj. Each of its members we write as D0jk[Z]. 

Let DOV stand for the tuple of default formulas that 
mention a free variable (i.e., that are open; all of these 
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Figure 5: Asocially Monadic: a flow diagram of the reformulation steps involved. See also Figure 4. 


are quantifier-free). Each of its members we write as 
DQVk[Z, x], Here a: is a single (free) individual vari- 
able. 

Next, we reformulate the base and default formulas 
that mention a variable. 

For each j E let BlVj stand for the instantia- 
tion of the quantified base formulas BOV to aj. Each 
of its members B\V jk[Z] is defined as the formula 
B0Vk[Z, aj]. 

Let UN N AMED[x ] stand for the formula 

A jej x # a J- 

Let B2V stand for the tuple of quantified base for- 
mulas after relativization to the unnamed case. Each 
of its members B2Vk[Z] is defined as: 

Vx. UNNAMED[x\ D B0Vk[Z,z ] 

For each j E J, let DIV j stand for the instantiation 
of open default formulas DOV to aj. Each of its mem- 
bers DlVjk[Z] is defined as the formula D0Vk[Z, aj]. 

Let D2V stand for the tuple of open default formulas 
after relativization to the unnamed case. Each of its 
members D2Vk[Z] is defined as: 

UNNAMED[x] 0 D0Vk[Z,x] 

For each j E J , Let Blj stand for the conjunction 
of (all members of) the tuples BOj and BlVj. 

For j = m + 1 (i.e., the unnamed case), let Blm + 1 
stand for the conjunction of (all members of) the tuple 
B2V plus the UNA. 

For each j E J t Let Dlj stand for the concatenation 


of the tuples DOj and DIVj. 

For j = m + 1 (i.e., the unnamed case), let Dim + 1 
stand for the tuple D2V. 

Let Rj be defined as the prioritization internal to 
Dlj, i.e., as R N * t where, for each j = 1, . . . , m + 1, Nj 
is the index tuple of Dlj. 

Proof Overview: We use a first stage of refor- 

mulation employing Theorem 13. This stage involves 
what we called above an “extra” kind of reformula- 
tion: e.g., to reformulate each open default axiom and 
each quantified base axiom into a collection of “point” - 
case (individual-case) axioms, plus a remainder- case 
(unnamed case) axiom. Then we use a second stage 
partitioning syndicate reformulation into disjoint de- 
scribability, employing Theorem 12. In that second 
stage of reformulation, we treat the UNA as back- 
ground. There, the newly introduced predicates are 
all 0-ary, except for those corresponding to the catch- 
all case. The definitional reformulator consists 
of the explicit definitions of these newly intro- 
duced predicates. There is one new predicate 
for each ground atom in the original represen- 
tation. Note that the second stage itself combines 
two kinds of reformulation: definitional reformulation, 
to transform into a representation with disjoint predi- 
cates, and conjunctive decomposition. □ 

Figure 5 illustrates the logical flow of the reformula- 
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Figure 6: Conjunctive Decomposition using Asocially Monadic and Disjoint Predicates: In our main motivating 
example (section 2), we can conjunctively decompose the global axiom set (after the last update U%) into two 
slices by employing the disjoint predicates result (Theorem 1): one slice about legged-ness, and the other slice 
about meetings. This first-stage decomposition is the same as in Figure 3. We can conjunctively decompose the 
legged-ness slice, in dividual- wise, by employing the asocially monadic result (Theorem 16). That is, in a second 
stage, we slice (more finely) within a slice that arose from the first stage. The second stage thus yields a second, 
finer-grain decomposition of the global axiom set, containing the meetings slice (unchanged from the first stage) 
plus each of the individual-case legged-ness slices. Together, the two stages exemplify the ability to decompose 
hierarchically / recursively. Each of the named-individual / “point” slices in the second stage contains a set of 
axioms that correspond to the instantiation / particularization of the original legged-ness axioms (It i, U 2 , and U 4) 
to (the case of) one named individual, e.g., Joe. Each outer box stands for a decomposition. Each inner box stands 
for a constituent axiom set. 


tion steps involved; it builds upon Figure 4. 

Application to Main Example: (Continued from 
the discussion in section 5:) Consider our main mo- 
tivating example (about legged-ness and meetings, 
from section 2). There, after the final update It § 
(and, indeed, at any earlier point in the sequence 
of updates), the legged-ness slice, i.e., the set of ax- 
ioms about legged-ness (It i, f/ 2 , and It 4 ) is asocially 
monadic. It can thus be conjunctively decomposed 
cleanly, individual-wise. Figure 6 illustrates and ex- 
plains this decomposition. As we discussed earlier, the 
definitional reformulation involved in the individual- 
wise decomposition cf. Theorem 16 introduces a new 
0-ary predicate for each ground atom in the original 
representation; in this example, two such new predi- 
cates are: 


nbatJoe = bat(Joe) 
n2legsJoe = 2 legs(Joe) 

Theorem 17 

(Individual-wise Locality of Inference, when 
Asocially Monadic) 

In Theorem 16, each slice j % where j is the (index of) a 
named individual (cf. statement of that Theorem), is 
sound and complete, relative to the global theory, for 
inference over its corresponding sub-language. That 
sub-language consists of the ground formulas (sen- 
tences) in which the only individual mentioned is j 
(e.g., Betsy). This locality holds both for forward in- 
ference, and for backward inference (query- answering). 
Note that to perform inference using any subset SJ of 
the named individuals J, one need only work in the 
conjunctive combination of those slices corresponding 


81 




to SJ . 

For query-answering about a new named individual 
6 (named in the query), just introduce the new term 
b into the set of terms that are indexed by J in the 
theorem. The only additional requirement is that the 
UNA ensure its distinctness from the other named in- 
dividuals. 

Application to Main Example: Thus after each 
update, inferences about any named individual’s (e.g., 
Joe’s) legged-ness can be made by working in a slice 
axiom set that has been instantiated / particularized 
to that individual (Joe). One advantage of this is 
that simpler inference algorithms are available for such 
an expressively simpler axiom set. In this case, there 
is a decidable polynomial- time procedure (see “totai- 
propositionar case results in [Grosof, 1992b]). By con- 
trast, there is no general inference procedure, even 
for query-answering, yet available for the full exam- 
ple (i.e., including the meetings aspect). (See next 
section for discussion of inference procedures available 
for prioritized circumscription.) This illustrates that 
decomposition-type reformulation is useful to exploit 
available / known tractable special cases to do part of 
the inference in a NM theory, even when the overall 
theory is intractable or undecidable (see next section 
for more discussion of this point.) 

Theorem 16 also immediately yields a powerful re- 
sult about belief revision. 

Theorem 18 

(Safety of Updating, when AsociaUy Monadic) 
In CLD, let the previous axiom set be asocially 
monadic. Let an update U consist of base, default, 
and prioritization axioms, such that the formula parts 
of the base and default axioms are ground and men- 
tion only a set of named individuals /£/. Then all of 
the previous conclusions derived solely from the rest of 
the named individuals’ ?s (i.e., the slices according 
to Theorem 16) are satV- ...der the update. 

Application to Main Example: 

E.g., after update Ua (mentioning only Joe and Spot), 
this theorem tells us that we do not have to re-consider 
whether the previous conclusion 2 legs{Bet8y) is still 
sanctioned: it must be preserved. Thus we can know, 
with relatively little computational work (see discus- 
sion of complexity in next section), that most of the 
previous NM conclusions are safe. 

Disjoint Groups of Individuals: 

Definition 14 and Theorems 16, 17, and 18 also gener- 
alize straightforwardly to considering disjoint groups of 
individuals, where any syntactic mixing in the axioms 
involves only individuals within the same group. 

Discussion, Conclusions, and Future 
Work 

Proof Procedures: Prioritized default circumscrip- 
tion is expressively reducible to prioritized predicate 


circumscription (see section 3). There exist several 
backward proof procedures for fairly expressive classes 
of prioritized predicate circumscription, including for 
layered (stratified) prioritization [Przymusinski, 1989] 
[Ginsberg, 1989] [Baker and Ginsberg, 1989] [inoue and 
Helft, 1990] [inoue et ai, 1991]. More interestingly, 
[Geffner, 1989] contains a proof theory and proof pro- 
cedures which promise to be easily adaptable (using 
an equivalence theorem reported in [Grosof, 1991], de- 
tailed in [Grosof, 1992b]) to circumscription with non- 
layered prioritization. 

Related Work: Note that we emphasize updating 
with new defaults, not just new for-sure axioms, un- 
like the conditional approaches to NMR (e.g., [Kraus 
et ai, 1990]). The ideas and results here apply to other 
NM formalisms, e.g.: Default Logic and Poole’s [1988] 
and Brewka’s [1989] systems, via the equivalence re- 
sult in [Lifschitz, 1990]; as well as Geffner’s [1989] sys- 
tem. The closest idea to conjunctive decomposition 
in the previous NMR literature is [Rathmann, 1990], 
who focussed, however, on conjunctively integrating 
heterogeneously-specified circumscriptive theories. He 
considered, moreover, only layered-priority predicate 
circumscriptions. Rathmann’s and our work was de- 
veloped independently. We are unaware of any other 
applications of reformulation to non- monotonic reason- 
ing. 

More Decompositions and Safeties: We did not 
have space here to report a number of additional results 
[Grosof, 1992b] about decompositions and their impli- 
cations for safeties of updating, including about higher 
prioritization, hypotheticals, syntactic positivity, “se- 
rial” decompositions, weaker forms of irrelevance; and 
about the relationship of decompositions to specifica- 
tion and backward inference. 

Ak^rithms and Automation of Our Results: 
In fu work, we plan to automate recognition of de- 
comr ons and safeties of updating cf. our theorems, 
and ictual performance of the according reformu- 
lation lot the disjoint-predicates and asocial- monadic 
cases, we have polynomial- time algorithms to per- 
form this: 0(n 3 ) time, where n is the size of the CLD 
axiom set. 

Exploiting Truth Maintenance: Such recogni- 
tion establishes “monotonicity” (i.e., implication) rela- 
tionships between theories and sub-theories (e.g., the- 
ory after update versus theory before update; or theory 
versus slice). We plan also to automate a generalized 
ATMS-style [de Kleer, 1986] high-level architectural 
book-keeping scheme to exploit such stored monotoni- 
city relationships to support inference and belief revi- 
sion in a prioritized database. [Grosof, 1992b] gives 
details. 

More General Cases of Disjoint Describabil- 
ity: In future work, we aim to find cases of dis- 

joint describability that are more general than asocial- 
monadic, but are still easily recognizable syntactically 
(in terms of the syntax of the global axiom set). E.g., in 



our main example, it would be nice to be able to par- 
ticularize the Meetings slice to the individual Ed, in 
the same way that the asocial monadic result guaran- 
tees one can particularize the Legged-ness slice to the 
individual Joe. Right now, we can show this particu- 
larization about Ed is legitimate, but our proof method 
is by hand. We would like to be able to formalize and 
automate a class of decompositions for which this (Ed 
etc.) is an instance. 

Conclusions I: See Strategy and Summary in 

section 1. 

Conclusions II: Analyzing Computational 

Advantages of Reformulation: In future work, we 
also plan to analyze in detail the computational ad- 
vantages and trade-offs involved in our decompositions 
and definitional reformulations. You may be wonder- 
ing why we did not give any such computational com- 
plexity analysis in this paper. The main reason is that 
the picture is quite complicated for non-monotonic rea- 
soning. 

Even for query-answering in propositional default 
theories without priorities, current results show worst- 
case is exponential (NP-hard) [Selman and Kautz, 
1989] [Kautz and Selman, 1989] [Selman and Levesque, 
1989]. Thus: Divide-to-conquer, i.e., seeking locality , 
is clearly desirable. 

But the basic complexity results for any kind of for- 
ward reasoning with priorities, for any kind of belief 
revision, and even for most kinds of backward (query- 
answering) reasoning are not available for circumscrip- 
tion, or other NM formalisms. Known tractable cases 
are highly restricted. ([Selman and Kautz, 1989] 
[Kautz and Selman, 1989] give polynomial- time back- 
ward procedures for special cases, including restric- 
tions of Horn, of propositional default reasoning in 
their model-preference logic and in Default Logic. Del- 
grande [199 1] gives a polynomial-time backward pro- 
cedure for a Horn propositional case of his conditional 
logic.) 

However, we believe that as these results become 
available, we will be able to show that decomposition 
and reformulation are advantageous. Our aim has been 
to develop methods that will be broadly applicable, 
and to break off a piece of the overall hard problems 
of non- monotonic reasoning. In current work, we are 
addressing how to relate our results to currently known 
tractable and intractable cases. 

One clear advantage is that for many cases with 
quantification, for which worst-case is undecidable 
([Reiter, 1980] [Kolaitis and Papadimitriou, 1988]): 
We are able to reformulate some of the reasoning to 
become propositional, hence decidable . E.g., when rea- 
soning about individuals, for the asocially monadic 
class of theories (see Theorem 16). 
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Abstract 

Our interest is in the design of multi- agent problem-solving sys- 
tems, which we refer to as composite systems . We have proposed an 
approach to composite system design by decomposition of problem 
statements. An automated assistant called Critter provides a library 
of reusable design transformations which allow a human analyst to 
search the space of decompositions for a problem. 

In this paper we describe a method for evaluating and critiquing 
problem decompositions generated by this search process. The 
method uses knowledge stored in the form of failure decomposi- 
tions attached to design transformations. We suggest the benefits of 
our critiquing method by showing how it could re-derive steps of a 
published development example. We then identify several open 
issues for the method. 


Introduction 

Our group is interested in the design of composite systems , 
ones that encompass multiple agents cooperating in an ongo- 
ing activity [Fickas & Helm, 1992] 1 . We arrived at this inter- 
est while studying the processes of software development. 
Systems analysts in the domains we studied [Fickas and 
Nagarajan, 1988] focused on policies and concerns which 
cut across human, hardware and software components. In 
composite system design, software agents are treated the 
same as human and physical agents, as components to be 
integrated together to solve larger system constraints. We 
have developed a design model, called Critter, to help a 
human designer create a composite system design [Fickas 
and Helm, 1992]. 

Figure 1 shows the place of composite system design 
within the more general system lifecycle we envision. We 
view the design process of a system as composed of four 
phases: 

1. Acquisition. The designer acquires an initial, informal 
statement of the problem in terms of text descriptions and 
diagrams. 

2. Formalization. The designer creates an initial formulation 
of the problem in terms of system and constraints. The 
initial system formally describes a minimal set of 
assumptions about possible behavior of the target system. 


1. This work was supported by the National Science Foundation 
under grant CCR-880485. 


The constraints formally describe the desirable behavior 
in terms of the initial system. 

3. Composite system design. Given the formulation of the 
problem as initial constraints and system, the designer 
uses Critter to build a formal specification of a composite 
system for the problem. A composite system is a set of 
interacting, reactive components called agents. Each 
agent is associated with a set of responsibilities , con- 
straints which the agent’s behavior must satisfy. If all 
agents behave according to their responsibilities, the 
composite system will solve the desired problem. 

4. Implementation. The agents of the composite system are 
implemented in the appropriate “technology” according 
to their specifications. This could mean producing soft- 
ware or manufacturing hardware. It might also involve 
writing legal statutes or training manuals describing the 
responsibilities of humans playing the role of an agent. 


Figure 1. Context of composite system design. 
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In designing a library circulation system, for example, the 
designer first acquires an informal statement of assumptions 
about the system, and constraints such as “Library patrons 
get the books they want” and “Every book is accounted for”. 
The designer formalizes the system and constraints. The 
designer then uses Critter to design a composite system rep- 
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resenting the library. This formally specifies the responsi- 
bilities of agents such as the online catalog (“Report 
catalog entry if book title found*’), the antitheft devices 
(“Sound alarm when magnetized book passes through the 
gate”), and even the library patrons (“Look in the online 
catalog if the book title is known”). Finally, the library 
agents are implemented. For the online catalog and 
antitheft device, this would involve writing or acquiring 
software and hardware. “Implementing” the library patron 
implies writing regulations and guidebooks to inform 
patrons of their role. 

We have begun to formalize an approach to phase 3, 
composite system design, by decomposing problem state- 
ments. The designer incrementally decomposes the global 
constraints in the initial problem statement into the con- 
junction of more manageable subconstraints. The designer 
then assigns responsibility for these constraints to particu- 
lar agents. For example, the designer of a library system 
could decompose the global constraint “Library patrons 
get the books they want” into “Library patrons can find the 
books they want” and "Library patrons can get the books 
they find.” The patron and the online catalog agents are 
assigned responsibility for the former constraint; the 
patron and the library staff agents are assigned responsi- 
bility for the latter. [Feather, 1987] illustrates the approach 
by an informal example. 

Critter includes a library of formally-represented com- 
posite system design tactics, and a suite of tools for auto- 
mated evaluation and critiquing of the designs generated. 
To incorporate the decomposition method into Critter, we 
need to (1) identify and formalize general tactics for 
decomposing problem statements, and (2) identify knowl- 
edge which Critter could use to critique problem decom- 
positions. 

This paper focuses on the latter problem, that of critiqu- 
ing problem decompositions. We illustrate a method for 
generating critiques, by showing how it rationalizes spe- 
cific steps in a published development example [Feather, 
1987]. In that example. Feather informally derived an ele- 
vator system design from the global constraints of never 
unnecessarily delaying passengers, and never moving pas- 
sengers further from their destination. The development 
was guided by Feather’s intuitions of the problem, and his 
domain knowledge. We show how we can capture some of 
this knowledge, in the form of a library of failure scenar- 
ios. We then discuss the research issues raised by this 
example. 

Our work addresses the workshop in two respects: 

1. We propose general techniques for evaluating problem 
decompositions in multi-agent systems. These tech- 
niques may find use beyond our interests, in formulat- 
ing problems for multi-agent planning or for 
distributed AI systems. 


2. The evaluation approach we propose in this paper 
requires techniques for storing and using compiled 
abstractions, specifically abstract plans. This workshop 
may identify research we can apply to our approach. 

Searching for decompositions 

In this section, we outline the Critter composite system 
design model, and its support for synthesizing problem 
decompositions. 

Critter treats composite system design as search in a 
state space (Figure 2). A typical search algorithm has the 
following components: 

• A state space representation 

• A set of search operators for moving between states 

• A solution checker which recognizes satisfactory 
states. 

• A heuristic evaluator which identifies promising states. 

• A search manager which maintains a record of states 
visited and operators applied. 


Figure 2 Composite design search. 



Each state in Critter’s search space represents a c 
plete composite system design for the problem at ha 
The "search operators” which move from state to state are 
design transformations stored in Critter’s knowledge base. 
The solution states in the search are acceptable composite 
system designs - Critter provides critiquing tools to help 
identify these. 

The last two components, heuristic evaluation and 
search management, are beyond the scope of our research 
at present. For heuristic evaluation, we rely on the human 
designer. Our studies of composite system design heuris- 
tics [Feather, Fickas, and Helm, 1991] [Fickas, Feather, 
and Helm, 1991] suggest that this task will have to remain 
with the designer in the foreseeable future. Support for 
human evaluation of design operators is the focus of other 
research [Johnson and Feather, 1991]. As for search man- 
agement, Critter is implemented using an extended form 
of IBIS [Conklin and Begeman, 1988] that provides for 
separate design states. Critter provides functions for 
searching and backtracking in this space. In our current 
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implementation, new states are generated by hand-simu- 
lating operator application using an editor. 

In the remainder of this section, we discuss Critter’s 
support for the first three search components: state repre- 
sentation, transformations, and solution checking. 

Design states 

Figure 3 informally represents an initial state for the eleva- 
tor design problem we use to illustrate our design 
approach. A state (hereafter “design state”) in Critter’s 
design search space has two parts: 

1- System. The system portion defines the space of possi- 
ble behaviors of the current composite system design. 
2. Constraints. The constraint portion of a design state 
defines the subset of possible behaviors which are 
viewed as legal or desirable. 

The system portion of a design state represents the space 
possible behaviors of the composite system. It specifies a 
set of objects, a set of primitive relations, and a set of 
actions which can add or delete object tuples from the 
relations. The system is thus similar to a planning domain 
for a STRIPS-like planner. 

Relations and actions in the system portion are also 
labelled by agents . Agents in our model are simple reac- 
tive components. A relation labelled by an agent can be 
sensed by that agent; an action labelled by an agent can be 
controlled by that agent. 

A behavior is a sequence of actions, each action 
labelled by its controlling agent. A prefix of a behavior 
represents the intermediate state of the composite system 
during its operation; to avoid confusion with design states, 
we will refer to execution states of the composite system 
as “points.” As with planning domains, the system portion 
is non-deterministic; more than one action may be possi- 
ble at a given point. 

The system portion in Figure 3 includes two classes of 
agents, an elevator and set of passengers. Each passenger 
controls its own actions of entering and exiting elevators. 
A passenger can sense which floor it is on, and whether or 
not it is in a given elevator. Passengers also have a destina- 
tion (not shown in the figure), which they know. The 
unique elevator controls its action of moving from floor to 
floor. It also can wait at a floor (not shown in the figure). 
The elevator can sense whether it is on a given floor. 

The constraint portion of a design state is composed of a 
set of constraints. Each constraint is a predicate which is 
true or false for each behavior generated by the system 
portion. A constraint may refer to either relations or 
actions in the system portion. 

The constraint portion of Figure 3 includes two con- 
straints: 

1 . NeverFurther: Elevator passengers should not move 
further from their destinations. 


2. NoDelays: Passengers should not be unnecessarily 
delayed. This means that at each point in the elevator’s 
behavior, it must either move, take on, or drop off pas- 
sengers, unless no passengers exist. 

Agents in the system portion can be assigned responsibil- 
ity for constraints. If an agent has been assigned responsi- 
bility for a constraint, that agent must act to satisfy the 
constraint. The agent must control its actions so that all of 
the behaviors it generates satisfy the constraint, re gardless 
of the actions of other agents. We call a constraint which is 
the responsibility of some agent an “assigned constraint.” 
The legal behaviors of a composite system design are 
all sequences of actions which can be generated by the 
agents in its system portion, and which satisfy all of the 
constraints and responsibility assignments of the con- 
straint portion. 

Figure 3 Initial state of the elevator problem. 

Constraints 
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NeverFurther: 
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We have represented the system portion informally, 
which is adequate for the purposes of this paper. Critter 
represents the system portion of design states are 
expressed in a Numerical Petri Net [Wilbur-Ham, 1985] 
notation, extended to include agents. 

The constraints are written in a linear-time, quantified 
temporal logic extended to include constructs for responsi- 
bility assignment [Dubois, 1990]. For the most part, the 
constraint notation used in this paper is simply the predi- 
cate calculus, except on the following points: 
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• Variables app ring in a constraint are universally 
quantified unle otherwise indicated. 

• Actions appear as predicates in constraints. The 
expression move(fj, f 2 ) in the NoDelays constraint, for 
instance, states that “The elevator moves from fj to f 2 
at the current point of the system’s behavior.” Ordinary 
predicates are capitalized to distinguish them from 
actions. 

• Temporal logic operators reference future and past 
points in the system’s behavior. The only construct we 
use in this paper is the * operator, which denotes the 
next point. Thus, the expression AtCfj) & Dest(p, f 3 ) & 
*At(f 2 ) can be read “The elevator is at fj and passenger 
p has destination f 3 and at the next point the elevator is 
at f 2 .” 

• Constraints can include responsibility assignment 
operators. [Feather, 1987] and [Dubois, 1990] give a 
formal semantics for this construct; we use it infor- 
mally throughout. 

• The notation C[t/t’] denotes the constraint C with all 
occurrences of t replaced by t\ Thus, the expression 
NeverFurther[p/pi] denotes the NeverFurther con- 
straint with all occurrences of p replaced by pj. 

Design transformations 

Critter has a library of design transformations that func- 
tion as operators in the search for an acceptable composite 
system design. Each design transformation has a pattern 
which matches against parts of an existing design state, a 
result which generates in a new design state, and a list of 
conditions called domain assumptions that must hold for 
the transformation to apply (we do not discuss domain 
assumptions in this paper). We will represent the pattern 
and result of transformations as Prolog-like clauses. 

Transformations are applied interactively. The human 
designer selects a transformation to apply, and matches the 
pattern of the transformation to components of the current 
design state. The system then generates a new design state 
incorporating the result of the transformation. 

In design by problem decomposition, most of the trans- 
formations applied are of the following form: 
pattern: constraim(C). 
result: constraint^ & ..., & CJ. 

C in the pattern is a constraint The transformation gen- 
erates a new state where C is replaced by a new constraint 
C\ & ... & C n that entails C. This in turn may be decom- 
posed into subconstraints. 

When the designer judges that the constraints have been 
decomposed into sufficiently simple subconstraints, she 
assigns responsibility for each of the : ^constraints to a 
single agent. As described above, assigning responsibility 
for a constraint C to an agent requires that agent to limit its 


actions so that C is met, regardless of the actions of oir ^; 
agents in the system. 

Finally, the designer applies transformations to unfold 
the assigned constraints onto the preconditions of actions 
in the system portion. The designer may also have to use 
low-level design editing transformations to change the 
details of actions and relations in the system. 

Our main interest is in the transformations for decom- 
position of constraints and assignment of responsibility. 
As an example, one class of decomposition transformation 
used in this paper is Zone Defense. Intuitively, Zone 
Defense decomposes a constraint by 

1. Selecting an object. 

2. Dividing the object’s lifetime into “zones”, and 

3. Splitting the com T 1 .? into subconstraints based on the 
“zone” the object 

More formally, gi\ ;onstraint C and a universally 
quantified variable v ve decompose C into subcon- 
straints based on poss ates of objects to which v can 

be bound. The applica: of Zone Defense to the Never- 

Further constraint of the levator problem is as follows: 
pattern: constraint(NeverFurther), uv(p, NeverFurther). 
result: constrain^ (p t ) 

3f enter(p lf f) NeverFurthe^p/pj] 

& 

3fj, f 2 move(f!, f 2 ) => NeverFurthertp/pJ 
& 

3f exit(pj, f) => NeverFurtherfp/pj] 

& 

(3f enter(p b f) v 3f b f 2 move(fj, f 2 ) 
v3fexit(p b f))). 

Intuitively, to ensure that passengers never move further 
from their destination, we can ensure that the constraint 
holds >. >en the passenger enters an elevator, when an ele- 
vator moves, and when the passenger exits the elevator. 

Having broken NeverFurther into more manageable 
subconstraints, the designer can next assign responsibility 
for one of the subgoals to the elevator. The only action the 
elevator controls are “move” and “wait”, so we separate 
these subconstraints of the decomposition, and assign 
them to the elevator with the Limit Each Action transfor- 
mation. The instantiation of this transformation on the 
move action reads as follows: 
pattern: constraint^) 

3f i, f 2 move(fi, f 2 ) => NeverFurther [p/p j] 
agent(elevator)). 
result: constraint^) 
responsible^ 1, 

3fj, f 2 move(fi, f 2 ) ^ NeverFurther[p/p[]), 
age rv elevator)). 

This trr* ^rmation requires that the elevator control each 
move au ^ so that NeverFurther holds, regardless of the 
actions ot the passengers. Unfortunately, there is no way 
for both NeverFurther and NoDelays to be met if Never- 
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Further is assigned to the elevator as shown here. Two pas- 
sengers going in different directions can enter the elevator 
and leave the elevator no choice but to either violate Nev- 
erFurther or NoDeiays. We discuss this example further 
below. 

Detecting solution states 

A solution state in Critter’s search is a design state where 
the system portion does not generate any behaviors which 
violate the constraints in the constraint portion. Critter 
includes analysis tools to help the analyst identify solution 
states. In [Fickas and Helm, 1992], we discuss several of 
these analysis tools and trade-offs between them. In this 
paper, we will discuss mainly the OPIE planning tool 
[Anderson and Fickas, 1989]. The system portion of a 
design state is effectively a planning domain. OPIE is a 
planner which shows that a design state is noi a solution 
by producing a plan incorporating actions from the system 
portion for violating one or more constraints. We refer to 
such a plan as & failure scenario . 

For example, to show that the initial elevator design 
state in Figure 3 is not a solution state, OPIE can generate 
a plan for violating the NeverFurther constraint from an 
initial point supplied by the analyst (+ indicates a relation 
added, - indicates a relation deleted): 

Initial. On(p, 1), At(l), D(p, 2); 

1. enter(p, 1): -On(p, 1) +In(p); 

2. move(l, 3): -At(l) + At(3); 

» Violation of NeverFurther « 

At(l) & In(p) & D(p, 2) & *At(3) 

&-iBetween(l,3,2) 

This illustrates the general style of solution testing in Crit- 
ter, we focus on identifying classes of behaviors or scenar- 
ios which violate the constraints, rather than verifying that 
the constraints are met. In the next section, we discuss 
some of the benefits of this approach. We also identify 
some of its limitations, and suggest how to address those 
limitations in design by decomposition. 

Critiquing with failure scenarios 

Critiquing composite system design states by failure sce- 
narios offers two benefits for design: 

1. Diagnosis . A scenario is a specific behavior of the sys- 
tem which violates a constraint. The designer can use 
this behavior to diagnose the problems of the current 
design state and identify potential solutions. 

2. Validation. The system portion of a design state is 
effectively a model of what is possible in the design 
domain. If a scenario generated from that model is 


counterintuitive or unlikely in the domain, this is a hint 
that the model is too weak. 

Our goal is to gain these benefits for design by problem 
decomposition. In this section, we suggest an approach to 
critiquing problem decompositions, and demonstrate the 
approach by showing how it could reproduce steps taken 
in a published composite system design derivation. 

Synthesizing an approach 

Planning over the system portion is not necessarily the 
best way to generate failure scenarios for decompositions, 
or for composite system designs in general. The planner 
cannot tell how likely, or how important a failure scenario 
it generates is. Consequently, it generates many scenarios 
with marginal value for design. More seriously, a designer 
can miss important failure scenarios in a design problem 
by “naive specification” of the problem. The planner relies 
entirely on the information in the design state to generate 
critiques. This knowledge may be incomplete or incorrect 
with respect to the design domain. The designer can 
exclude a particular failure, even a common one, by not 
including actions in the system portion which allow the 
planner to generate that failure. For example, the designer 
of a library can miss the possibility of books being stolen, 
by not encoding a “steal book” action in the initial design 
state. 

A critic with domain knowledge can focus more quickly 
on serious problems, and can recognize problems even in 
naive specifications. We describe a domain-specific critic 
called SKATE for library design in [Fickas and Nagarajan, 
1988]. SKATE has a case base of 1) library designs, 2) 
constraints they meet or violate, and 3) failure scenarios 
for those designs. Given a proposed design and a set of 
constraints, SKATE retrieves designs from its case base 
that match features of the proposed design, and that vio- 
late the proposed constraints. It then runs failure scenarios 
from the retrieved designs to demonstrate the problems. 
Given a library design including unrestricted checkout of 
books, for instance, and a constraint “users have a large 
selection of books to choose from”, SKATE retrieves a 
design case with unrestricted checkout. It then executes a 
stored failure scenario of a “run” on the library, in which 
unrestricted checkout is used to strip the shelves bare. 

SKATE’s case base points it directly to well-known 
library failure scenarios, avoiding the problem of generat- 
ing marginally useful scenarios. SKATE also avoids the 
problem of naive specifications. The failure scenarios 
SKATE generates are not restricted to using the actions 
and relations specified in the proposed library design. 
They can also include “environment” actions such stealing 
or destroying books, which a designer might not specify 
but which are known to cause problems in the library 
domain. 
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SKATE, however, suffers from a limited ability to 
match designs against cases. In general, it is hard to match 
the features of one arbitrary specification to another [Rob- 
inson, 1990]. SKATE requires the user to manually map 
features of the proposed library design into features used 
in SKATE’s case base. This task is onerous and error- 
prone; important critiques can be missed by user mistakes 
in the mapping process. 

One solution proposed by Fickas and Nagarajan is to 
integrate matching more closely with the process of pro- 
ducing designs. They suggested that the proposed design 
be generated by domain-specific editor, equipped with a 
collection of library components appearing in the case 
base. In effect, this limits the designer to producing 
designs SKATE knows how to critique. 

Based on these considerations, we propose the follow- 
ing approach which integrates the approaches of OPIE and 
SKATE: 

1. We will use Critter’s transformation library in place of 
the case base of SKATE. Each decomposition transfor- 
mation has an attached set of failure scenarios repre- 
senting its typical defects. Critter thus plays the role of 
the domain-specific editor proposed by Fickas and 
Nagarajan. 

2. Critter matches failure scenarios when it applies a 
transformation. Matching is simpler, compared to 
SKATE, because the instantiation of the transformation 
itself guides the matching process. 

3. Critter critiques a design state using the OPIE planner. 
OPIE produces plans by specializing and refining pre- 
viously matched failure scenarios. 

This approach addresses the problem of marginally useful 
scenarios by storing a library of typically useful scenarios 
on transformations, and using these scenarios to focus the 
planner. Our study of failures in multi-agent systems [Fic- 
kas, Feather, & Helm, 1991] suggests that we can find 
such characteristic failure scenarios for problem decompo- 
sitions. The approach also addresses the naive modelling 
problem by allowing failure scenarios to introduce new 
relations and actions into the design state being critiqued. 
As in SKATE, these “environment** components represent 
knowledge of well-known problems that crop up in multi- 
agent systems. 

To illustrate this approach, we next show how critiques 
generated this way could anticipate two design steps 
which occurred in the composite system design develop- 
ment described in [Feather, 1987]. 

Focusing on a decomposition failure 
Recall that Feather’s elevator design problem had two ini- 
tial constraints: 

1 . Passengers should never move further from their desti- 
nation (NeverFurther). 


2. Passengers should not be unnecessarily delayed 
(NoDelays). 

From the constraint that passengers never move further 
from their destination, the designer in Feather’s example 
“chooses the implication” that passengers in the same ele- 
vator must be travelling in the same direction. We show 
how a failure scenario can focus the planner to reproduce 
this design step. 

Earlier we showed a development step which assigned 
the NeverFurther goal to the elevator. This step used a 
transformation called Limit Each Action. As noted above, 
this assignment requires the elevator to satisfy NeverFur- 
ther for all combinations of passengers and floors, regard- 
less of prior actions of the passengers involved. Critter can 
generate an interesting counterexample to this constraint 
using a scenario attached to the Limit Each Action trans- 
formation. The attached scenario is called “incompatibility 
conspiracy”. The abstract incompatibility conspiracy sce- 
nario requires that: 

1. There are two agents in the system portion whose state 
can affect the truth of the constraint assigned by the 
transformation. 

2. These two agents can act to reach a state S where an 
application of the action A will fail to satisfy the con- 
straint for either one agent, or for the other. For the 
assigned constraint C and limited action A, we can 
compute the conditions on the state S the conspiring 
agents must reach. Specifically, we regress 3al, a2 -» 
(C[al] & C[a2]) through the action A. 

Instantiating the scenario on the application of Limit Each 
Action, we get a goal of generating a state where: 

• There are two passengers in an elevator on a floor fi 

• The two passengers have destinations f 3 , f 4 

• No floor f 2 exists such that Between(f b f 2 , f 3 ) & 
Between^, f 2 , f 4 ) 

It remains for the planner, OPIE, to try to extend this min- 
imal “scenario” into a plan. This requires a considerable 
effort on OPIE’s part. If such a plan can be found, how- 
ever, it provides a motivation for the requirement that pas- 
sengers only enter the elevator with compatible passengers 
- passengers travelling in the same direction. 

Using an abstract failure scenario thus allowed the plan- 
ner to recognize a critical deficiency, one which Feather 
deduced informally in his example. 

Critiquing a naive communication model 

In another step of Feather’s development of the elevator 
problem, passengers have been assigned to enter the ele- 
vator when a suitable one arrives at the passenger’s floor. 
The elevator has been assigned to take passengers to their 
destination when they enter. From this, the designer in 
Feather’s example derives the constraint that the passen- 
gers communicate their presence on entering the elevator. 
We show how an abstract failure scenario could lead a 
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designer to this communication protocol, by introducing 
environment actions and relations which cause a stereo- 
typical breakdown of communication. 

The starting point for this development is the NoDelays 
goal, which requires that the elevator must either move or 
load and unload passengers when any passenger is 
present. The designer applies a macro- transformation 
called Sequential Split to the NoDelays goal. This trans- 
formation combines a Zone Defense operator with respon- 
sibility assignment. It subdivides the task of moving 
passengers into sequential zones, based on the status of the 
passenger. In particular, the designer uses Sequential Split 
to make passengers responsible for NoDelays when the 
elevator arrives at a floor. Responsibility passes sequen- 
tially to the elevator once the passenger enters. The instan- 
tiated version of Sequential Split expresses this formally: 
pattern: constraint 

On(p, f) & At(f) => NoDelays), 
agent(p), agent(elevator). 
result: constraint 

On(p, f) & At(e, f) =* Responsible(p, enter(p, f)) 

& 

(In(p)&At(fi)=> 

Responsible(elevator, 3 f 2 move^, f 2 )) ))) 
agent(p), agentelevator). 

Note that the requirement that the elevator moves, coupled 
with the NeverFurther constraint, ensures that the passen- 
ger will eventually arrive at its destination. 

Our studies of transportation system failures suggest 
that sequential decompositions, while common, frequently 
fail due to “hand-off errors”. In one hand-off failure sce- 
nario, for instance, the agent responsible for the second 
half of a sequential decomposition fails to pick up where 
the first agent leaves off, because it does not recognize it 
has become responsible. Translating this to the current 
problem, the elevator may fail to move, because it does 
not recognize that the passenger has entered and thus 
handed off responsibility for NoDelays. 

This sequence of events is encoded as an abstract sce- 
nario attached to Sequential Split. Instantiated with the 
Sequential Split transformation above, it asks the planner 
to expand a sequence of states where: 

1. 3p, f On(p, f) & At(f); 

2. In(p) & At(f) & -ElevatorResponsibleForMove 
Note that the abstract scenario introduces a new binary 
relation ElevatorResponsibleForMove. This relation rep- 
resents the elevator’s internal model of the condition that 
activates its responsibilities. The failure scenario also 
introduces actions for asserting and deleting this relation. 
As with SKATE scenarios, abstract scenarios in Critter 
can add environment actions and relations to the design 
state for use in generating critiques. In this example, the 
new components allow OPIE to generate a plan in which a 
passenger enters the elevator, but the elevator does not 


recognize this (ElevatorResponsibleForMove is false), and 
so does not move. 

Environment components introduced by attached sce- 
narios allow OPIE to avoid the naive modelling problem. 
They force the designer to consider behavior which is typ- 
ical for a class of problem decompositions, even if the 
designer has neglected to include components which sup- 
port such behavior in the initial design state. 

Returning to our example, the designer acknowledges 
the scenario, and designs a communication protocol to 
prevent it. The passenger becomes responsible for notify- 
ing the elevator when it enters the elevator. The elevator 
will acknowledge the handoff. This can be implemented 
by a familiar interface: passengers hit a button on entry to 
the elevator, and the button lights in response. 

The handoff failure scenario thus produces and rational- 
izes an interface component developed in the Feather 
example. This step also shows how a failure scenarios 
incorporating environment components can expose naive 
assumptions about inter-agent communication, and lead to 
more realistic agent interfaces as a result. 

Conclusions and Issues 

We have proposed an approach to composite system 
design based on problem decomposition. To evaluate 
designs generated by the approach, we have proposed a 
method of scenario-based critiquing using compiled 
knowledge of typical failures of problem decompositions. 
Our method combines the approaches of earlier plan- 
based and case-based design critics we have developed. It 
addresses the problem of matching cases which stymied 
the case-based critic. It also helps solve the problems of 
unfocused search and naive modelling which were the 
principle drawbacks of the plan-based critic. 

There remain numerous open research issues for the 
approach. Two issues in particular may be of interest to 
this workshop. 

First, can we store scenarios on transformations which 
are specific enough to be more useful than simply running 
the planner? For example, the incompatibility conspiracy 
scenario was extremely general, and costly to instantiate. 
OPIE could possibly find the associated plan just as 
quickly by directly analyzing the design state. One rejoin- 
der is that the transformation associated with the incom- 
patibility conspiracy scenario, responsibility assignment, 
is too general to have useful scenarios associated wuh it. 
Increasing the grain size of transformations, and placing 
scenarios only on the large-grained transformations, might 
give better results on evaluation, but at a cost of increasing 
the size of Critter’s transformations and complicating their 
application. The research issue: how can we evaluate the 
trade-off between more effective evaluation knowledge, 
versus more general problem decomposition methods? 
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Related to the issue of transformation versus scenario 
grain size is the question of combining multiple failure 
scenarios. For example, consider the step which split the 
NoDelays goal. In that step, we applied Sequential Split, 
which combined three smaller transformations (Zone 
Defense and two responsibility assignments). The result 
was tested by scenarios stored on Sequential Split. Sup- 
pose instead we had applied the three primitive transfor- 
mations. How should we merge the separate stored failure 
scenarios into a combined scenario; or, alternatively, how 
can we decide which of the scenarios is the most important 
to run? 

Related Work 

Our work extends and formalizes that of Feather [1987], 
who proposed the concept of responsibility assignment 
and informally demonstrated a development methodology 
based on decomposition and assignment of constraints. 
[Dubois, 1990] developed a constraint formalism, and a 
development methodology incorporating responsibility 
assignment, which has influenced our own work. 

The decomposition design process can be viewed as a 
multi-agent extension of “operationalization” [Mostow, 
1983]. Mos tow’s FOO and BAR systems designed prob- 
lem-solving programs by decomposing and weakening 
constraints until they were expressible in terms of easily 
computable functions. The problem-solving systems we 
are designing, however, incorporate a broad range of 
social, hardware, and software systems. Consequently, it is 
difficult to state a compact operationality criterion for a 
given design problem. We rely on the human analyst to 
judge operationality. Similarly, constraint violations in our 
design problems may have consequences ranging from 
trivial to life-threatening. Weakening and approximating 
constraints therefore is much more problematic; we do not 
attempt to address it with our current research. 

[Steier and Kant, 1985] argue for the importance of exe- 
cution in designing algorithms. Our style of critiquing is 
motivated by similar considerations. The approach we 
propose grows out of our previous work on case-based 
[Fickas and Nagarajan, 1988] and planner-based [Ander- 
son and Fickas, 1989] critics. [Dubois and Hagelstein, 

1988] propose a slightly different approach to critiquing: 
derive implications by forward inference over the con- 
straints, and present them to the user for validation. A 
critic using this approach requires knowledge to decide 
which deductions to make; abstract failure scenarios pro- 
vide our method with this guidance. 

Critter’s critiquing task is similar to that of failure crit- 
ics in planning systems such as CHEF [Hammond, 

1989] .The failure critics of CHEF attempt to steer CHEF’s 
planner away from two types of failures: 


1. Planning failures. These occur when the planner gener- 
ates a plan that does not meet its goals, due to a false 
move by the part of the planner e. g. misordering two 
interacting steps. 

2. Expectation failures. These occur when the planner 
generates a plan which does not meet its goal when 
executed in the environment of interest. Expectation 
failures arise when the planner’s knowledge of its 
domain is incomplete or incorrect. 

CHEF includes mechanisms for learning new failure crit- 
ics from past planning or expectation failures. It also auto- 
matically indexes failures to planning moves that avoid 
those failures, and to moves which repair those failures. 

In Critter, the “planner” is the user, and the “planning 
moves” are the transformations in Critter’s library. The 
failure scenarios on a transformation identify both plan- 
ning failures and expectation failures which could arise 
from using that transformation. 

Critter does not, however, automatically learn failure 
scenarios from failures when they are encountered, due to 
the generality of its transformation library. CHEF was 
designed to operate within a fairly specific task domain 
(its example domain was Szechuan cooking). Conse- 
quently, it did not have to be too “finicky” in its choice of 
failures to learn [Minton, 1990]. In contrast, we hope to 
reuse Critter’s knowledge base across diverse domains, 
such as transportation systems, network applications, and 
resource management systems. This makes it more diffi- 
cult to automatically decide whether a given failure sce- 
nario is worth storing, and at what level of abstraction it 
should be stored. Our initial focus is thus on automated 
reuse of handpicked failure scenarios; learning the scenar- 
ios from previous design effort is a topic for future work. 

Similarly, Critter does not automatically index from 
failures to avoidance or repair transformations. The 
“plans” (formal specifications) that Critter produces are 
allowed to contain more complex operators -- iterative, 
conditional, and uninstantiated operators, for example -- 
than the plans of CHEF. This makes it harder to explain a 
failure, assign blame for the failure to specification com- 
ponents, and index through those components to relevant 
transformations. For the present, we rely on the designer 
to perform indexing, but view it as an important area for 
future research. 
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Abstract 

Problem Solving systems customarily use back- 
tracking to deal with obstacles that they en- 
counter in the course of trying to solve a problem. 
This paper outlines an approach in which the pos- 
sible obstacles are investigated prior to the search 
for a solution. This provides a solution strategy 
that avoids backtracking. 

Introduction 

Many weak methods of problem solving are based upon 
the idea that a problem can be solved by choosing a 
sequence of goals and satisfying them in some order. 
GPS (Newell and Simon 1972) was amongst the first 
to set out this approach. Since then the work of Ernst 
and Goldstein (Ernst and Goldstein 1982), Korf (Korf 
1985), and Guvernir (Guvernir 1987) has built upon 
this idea. The culmination of this kind of approach is, 
in some ways, the Soar system, which through the 
creation of a large production system with learning 
capabilities is able to incorporate many of the weak 
problem solving systems into a single system. 

If one compares Soar and Korf’s system they take 
quite distinct approaches to the problem of what 
should be learned and when it should be learned. 
Korfs system is able to specify in advance exactly 
what macros it needs to learn. This will yield bene- 
fits in the system’s ability to determine which macro 
to use at a given point in the solution, at the price 
of requiring long searches for some of the more com- 
plex macros. Soar on the other hand learns only the 
solutions to the difficulties that actually arise. This 
conservative attitude toward learning means that the 
system can encounter problems in matching expensive 
chunks that do not arise in Korf’s situation. 

This paper looks for a half way house between these 
two strategies. We would like to obtain the benefits 
of easier pattern matching afforded by Korf’s system 
without having to pay the price of the large amount 
of search that his system needs. Our approach is to 
show that for a substantial number of problems one can 
anticipate the impasses that will be encountered by a 
problem solver. These can then be modeled and solved 


in small pieces of the larger problem, thus avoiding the 
deep searches required in Korf. 

Proems, Strategies and Impasses 

We rev oriefiy the definitions that we will need. A 
consent., as to the appropriate definitions seems to be 
emerging (Banerji 1983), (Benjamin et al. 1989), and 
(Niisuma and Kitahashi 1985). Our definitions follow 
this trend. Some of them have appeared previously in 
(Hodgson 1989). 

Problems and Subproblems 

Our definition of a problem is based upon the idea of 
an action. 

Definition 1 A free problem P is a triple ( 5 , 0 , a) 
where a is a partial map 

a:5x0 — 5 

The set 5 is called the state space of the problem 
and the set 0 is the move set of the problem. The 
map a represents the effect of the moves on the state 
space. The effect of a move u on the state s is to give 
the state a(*,ur). The element a(s, a/) fails to exist 
precisely when (s,t*>) is not in the domain of a; that is 
when w cannot be applied to the state s. A sequence 
E = (u> lt . . . , a/*) of moves on P is called admissible at 
s if the composition 

°(*i £) = «(«*(• • • (*. wi), «a) • • • i w») . . .) 

exists. 

We need a notion of maps between problems. 

Definition 2 Given two problems Pi = (S^Oi,^) 
and P* = ( 53 , 02 , 03 ), with a pair of maps f : S\ — > 
52 and g : Oi — ► The pair (fg) defines a strict 

homomorphism F : Pi — * P 2 provided that 

1. Given two points si and s 3 such that f(s\) = /(s 2 ), 
then if the move u> applies to Si it also appplies to 
*2 and 

2. The equation 

/(01 (*,«)) = a 2 (/(«) I j(tt)) 

is satisfied m the sense that whenever the right-hand 
side exists so does the left-hand side . 
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A strict homomorphism F is a monomorphism if 
the underlying maps f and g are one to one. 

We now turn to the notion of (strict) isomorphism. 

Definition 3 Two problems Pi and P 3 are strictly 
isomorphic if there exists a pair of mutually inverse 
strict homomorphisms F : Pi => P 3 and H : P 3 => Pi 
between them . 

We can use monomorphisms in a natural way to de- 
fine subproblems. 

Definition 4 Let P = (5, D,a) be a problem. A prob- 
lem P 0 = (SojDoiOo) ** a (strict) subproblem of P 
if there exists a problem monomorphism {f,g) = F : 

Po^P- 

It is worth noting that the requirement that the un- 
derlying state map be a monomorphism has the ef- 
fect that even the weaker definitions of homomorphism 
such as the weak homomorphism of Niisuma and Kita- 
hashi (Niizuma and Kitahashi 1985) lead to the same 
subproblems. 

As an example of the concepts developed here we can 
take the sliding tile pussies. In particular we might 
take the fifteen pussle. The state space is then the 
set of all legal arrangements of the fifteen tiles and 
the blank in the 4x4 array. The move set can be 
given by the set D = {17, D , L, R } where the letter 
indicates the direction in which a tile is moved. A 
typical subproblem can be obtained by restricting one’s 
attention to the tiles in the top half, (assuming that the 
blank lies in the top half). Moves on this subproblem 
are restricted to be those in which the blank remains 
in the top half of the array. 

Strategies 

So far we have not recognised that problems are sup- 
posed to represent things that are to be solved. To do 
this we define a problem instance for a problem P 
as a triple (P, a,G) where a is a state of P called the 
start state, and G is a subset of the state space called 
the goal set. A solution to the problem instance is a 
sequence £ of moves which is admissible at a and such 
that a(a t £) E G . 

Informally a strategy is a sequence of intermediate 
subproblem instances between the initial state and the 
goal. We can distinguish two classes of strategies. In 
the first the successive state spaces overlap; we call 
these ample strategies. In the other the successive state 
spaces are disjoint; we call these abutment strategies. 

Definition 5 An ample strategy for a problem in- 
stance (5, Q, o, cr, G) is a sequence {Po, . . . , P*} of sui- 
problems of P — (5,0, a) such that the state spaces 
of successive subproblems have non-trivial intersection , 
that is Si~iC\Si ^ 0 Vi E 1, . . . , Jb. Furthermore a 6 So 
and G C 5* . 

An abutment strategy for a problem instance is 
a sequence {P 0 , . . . , P*} of subproblems of P such that 

Lae S 0 , 


2. GCS k , 

3. Si-iHSi = 0 Vi g l,...Jb # 

4 • for each i E 1 , ...Jfe there exists at least one pair 
(*<_i,®<) of points of S such that there is a move 
w G O for which o(*i_i,u;) = i,. 

A solution is based upon a strategy if it is obtained by 
concatenating a sequence of solutions to the interme- 
diate subproblems. 

We illustrate the two kinds of strategies with exam- 
ples. For our example of an ample strategy we con- 
sider the case of Fool** disk. This problem has been 
discussed by Ernst and Goldstein (Ernst and Goldstein 
1982). It consists of four concentric rings each of which 
is free to rotate about the common center. Each ring 
has eight numbers on it, appearing at 45 degree inter- 
vals around the ring. The goal of the problem is to 
rotate the rings so that the sum of each radius is 12. 
The standard strategy is as follows: 

• By using only rotations through 45 degrees, make 
the sum on each pair of perpendicular diameters 48. 
P 0 thus has the same state space as P but a smaller 
move set. 

• By using only rotations through 90 degrees, make 
the sum along each diameter 24. Pi has as state 
space a set of states in which the sum on each pair 
of perpendicular diameters is 48. The move set is 
again a subset of the original one. 

• By using only rotations through 180 degrees make 
the sum along each radius 12. P 3 has as state space 
a set of states in which the sum along each diameter 
is 24 and once again the move set is a subset of the 
original. 

This strategy, when successful (about which more 
later), reduces the amount of search from 8 3 moves 
(the center ring can be kept fixed) to 8 x 3 moves. 

Our second example is an elegant solution of the 
five puzzle that has been presented by Banerji (Banerji 
1990). He observes that there is a way to represent the 
states of the five puzzle by points on the faces of a 
dodecahedron. The sequence of moves that circulates 
the blank around the the circumference of the the puz- 
sle moves through all the states on one face. Passage 
from one face to another is effected by the moves that 
slide the blank up and down in the centre column. The 
strategy in this case consists of choosing the sequence 
of faces (each of which is a subproblem) through which 
one must pass from the start to the goal. 

There is an important difference between these two 
examples. In the second example once the strategy is 
chosen no backtracking over the solutions to the inter- 
mediate problems is necessary but in the case of the 
fool’s disk it may be necessary to backtrack since it is 
possible that the first arrangement in which the sum 
on all the diameters is 24 may not lead to a solution 
and another arrangement is needed. Niizuma and Ki- 
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tahashi (Niizuma and Kitahashi 1985) give a sufficient 
condition for this not to occur. 

Proposition 0 Suppose that for each subproblem oc- 
curring in a strategy it is the case that any instance 
of the subproblem can be solved then no back tracking 
will be needed to construct a solution to the original 
problem following the strategy . 

It may seem that the restriction on the state spaces 
of the intermediate problems is unduly restrictive. Yet 
it is exactly this that is needed to avoid backtracking. 
Thus one aim of our approach is to find strategies for 
which this hypothesis is true. 

Impasses 

At any stage in the execution of the strategy one has 
a subproblem instance (P<, Q», a*, <7», Gi) where in the 
case of an ample strategy Gi is 5* O S»+i or in the case 
of an abutment strategy Gi is the set of points of Si 
from which a move to S <+ 1 is possible. We have seen 
that the strategy proceeds smoothly as long as these 
intermediate problems can be solved. 

Definition 7 An intermediate problem for which 
there is no solution is called an impasse for the strat- 
egy. 

This terminology follows one case of the use of the 
term in the Soar system, in so doing we are also fol- 
lowing the usage of Ruby in (Ruby and Kibler 1989). 

It is important to note that our definition of an im- 
passe in a problem is dependent upon the strategy cho- 
sen to solve the problem. Thus in the Banerji solution 
to the five puzzle there are no impasses since each in- 
termediate goal is attainable. By contrast in the more 
usual method in which the tiles are brought into posi- 
tion in a prearranged order there are impasses. 

Learning the Impasses 

Our approach to finding impasseless strategies is to im- 
prove an existing strategy by modifying the subprob- 
lems so that they do not contain any impasses. As an 
example we show that in the case of Fool’s disk we can 
do this by enlarging the intermediate problems. This 
need not always be the case as we shall see in some of 
the examples that we discuss. 

For the Fools’ disk case we can consider the inter- 
mediate problems defined as follows: 

• By using only rotations through 45 degrees, make 
the sum on each pair of perpendicular diameters 48. 
P 0 thus has the same state space as P but a smaller 
move set. 

• Let Pi have as state space the set of all states in 
which the sum on each pair of perpendicular diam- 
eters is 48. The move set is again a subset of the 
original one. It may contain some moves through 45 

degrees. 


• Let Pa have as state space the set of all states in 
which the sum along each diameter is 24. The move 
set may contain moves through 90 or even 45 de- 
grees. 

It is clear that for these problems no backtracking 
into earlier problems is necessary. 

Finding the Impasses 

Problem solvers such as Soar (Laird et al. 1986) and 
the stepping stone method (Ruby 1989) find the im- 
passes in the course of attempting to satisfy the cur- 
rent goal. A search procedure is then invoked to resolve 
the impasse and the resolution of the impasse become 
part of the problem solver’s knowledge about the prob- 
lem. This is an accurate representation of much human 
prrvV.em solving, but it does not tell the whole story. 
O' faced with a problem a human will actively con- 

si :e difficulties that may arise in the course of the 

re ^on of the problem to see if they can be solved. 
O ivantage of such an approach offers is that it 
ah.. < one to take advantage of efficient storage tech- 
niques once one has determined that a small group of 
chunks will be adequate to solve the problem. This 
addresses in some measure the problem of expensive 
chunks (Tfcmbe et al. 1990). 

We give here a recognition criterion that forms the 
basis for an algorithm that can be used to produce 
impasses in problems. The criterion will be stated for 
the cases where the strategy is based upon the idea 
of reducing a set of features to their goal values. We 
begin by formalising this notion. 

Given a problem P a feature on P is a map 

f - S T[f) 

between the state space of P and some finite set T(f) 
called he target of of the feature. A set T of features 
is called discriminating if given any two state s 0 and 
s\ of P there is some feature / 6 T such that f(s 0 £ 
/(sj). The set is called adequate for a goal G if given 
any state s which is not a goal state there is some 
feature / such that f(s) is not a member of f(G). 

The strategy associated to an ordering {/i, ...,/*} 
of a set of adequate features is the sequence of subprob- 
lems Pi = (SitQtOi t ai t Si+i) where S< is the set of all 
states for which the features /i, . . . , /*-i have goal val- 
ues, a i is the restriction of a to a" 1 Si D (5, x ft). For 
these strategies we can give a recognition criterion for 
impasses. 

Proposition 8 Let Pi = (5*, D, a*, a iy S,- +1 ) be an in- 
termediate problem for a strategy based upon an ade- 
quate set of features, then Pi is an impasse instance if 
either 

• No move changing the value of fi applies to or 

• Any sequence of moves on P that reduces fi from cr, 
must change the value of at least one of /i, . . . /,-i . 
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From this point forward the argument goes as fol- 
lows. First, find an impasse. Second, produce a 
“smaller” example of the same impasse. Thirdly, ex- 
pand the example to a subproblem in which the im- 
passe can be resolved. Finally, show that the problem 
has a strategy based upon the new set of subproblems. 

Examples of Impasses 

To obtain an impasse of the first kind we can turn to 
Suss man’s anomaly in the blocks world. The impasse 
can be succinctly described by the following figure. 



Although one can get M closer” to the goal by putting 
B on top of C in the position on the left hand side it 
will be necessary to undo this since the goal of putting 
A on B requires that the top of A be clear. Thus no 
move that will achieve the desired position for A is 
available. 

To get an example of the second kind we con- 
sider the fifteen pussle with the initial strategy 
of moving the tiles into position in the order 
1,2,3,4,5,6,7,8,9,13,10,14,11,12,15 (the ordering 
at the end is chosen to be a good one, we do not need 
to. go this far though). 


1 

2 

3 

5 

15 

6 


4 

9 

10 

12 

11 

13 

14 

8 

7 


A Fifteen Pussle Position 


In the diagram above we find an impasse when we 
come to try and locate tile 4. The smallest subprob- 
lem in which this impasse appears is the 2x2 up- 
per right hand corner in the diagram where which 
we place 3, 5, 4, blank reading clockwise from the top 
right. (The choice of 5 is not significant.) This can 
be solved in the five pussle that is obtained when we 
restrict attention to the top two rows and rightmost 
three columns of the pussle. Furthermore we can cover 
the state space of the fifteen pussle with copies of the 
five puzzle in the way that will be detailed in the next 
section and obtain an impasse free strategy. 


In fact the recognition criterion given in proposition 
8 permits one to write a simple program that will gen- 
erate impasses in both these cases. Furthermore the 
expansion of the subproblem described in the example 
of the sliding tile pussle will provide the means for re- 
solving the impasses. This is the subject of the next 
section. 

Atlases: Solving the problem 

In this section we will describe a modified version of 
the notion of a strategy. In some sense it is a meta- 
strategy in that it is designed to produce an impasse 
free strategy for a problem by choosing the sequence 
of subproblems from a set of subproblems whose image 
cover the whole of the state space. The basic idea is 
that one determines what impasses may arise in the 
problem and then expands them to subproblems that 
resolve the impasses. These impasse resolving prob- 
lems are then used to cover the state space of the prob- 
lem giving rise to a new strategy. 

Charts 

It is convenient to introduce two auxiliary notions. 
These are chart and atlas. The idea is that chart are 
pieces of a problem that are all modeled on some com- 
mon subproblem. The important charts will be the 
ones that contain the resolutions of impasses. 

Definition 9 Let P be a problem and s a state in P. 
Then a chart for P based upon a problem Pq is a prob- 
lem monomorphism P 0 — ► P whose image contains s. 

An atlas for a problem P is a finite collection A of 
charts such that every point in the state space of P is 
in the image of some chart of A . 

We define the images of two charts /i : Pi P and 
/a : Pi — * P to be incident if either 

1. fx(Pi) n/afPj) contains at least one move common 

to both subproblems, or 

2. there exists a state sie/i(Pi) and a state 526 / 2 (^ 2 ) 

such that there is a reversible move u with a(s! , w) = 

92- 

The distraction of a problem associated to an atlas is 
the graph whose vertices correspond to the embedded 
charts of the atlas with an edge between each pair of 
incident charts. A sequence of pairwise adjacent charts 
is called a chain. 

We will want to distinguish between two types of 
abstraction. An abstraction in which the charts over- 
lap will be called an ample atlas. One in which all the 
charts are incident but do not overlap will be called an 
abutment atlas. 

We give two examples of abstractions associated to 
an atlas. The first is based upon the earlier solution of 
the five pussle. Here the charts consist of the images of 
the sub-problem of the five puzzle consisting of those 
states that are obtainable by moving the blank around 
the circumference of the puzzle. As Banerji remarks 
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(Banerji 1990) this represents the accessible states of 
the five puzzle on the faces of a dodecahedron. The 
faces of this are the points of the abstraction and the 
edges (which correspond to the move of the blank up or 
down in the middle column) correspond to the edges. 

We can obtain an abstraction of the blocks world 
by “welding together” adjacent blocks so that we have 
only three big blocks to consider. Each big block is 
itself a blocks world and the three block world already 
contains the generating example of the impasse. 

These two examples suggest that the correct choice 
of an atlas will allow one to give an impasse free strat- 
egy. 

The Atlas Meta-strategy 

Atlases serve as abstractions of a problem. Given a 
problem instance and an atlas on a problem we can 
define a problem instance on the atlas, “"he problem 
is to find a chain joining a chart containing the start 
position to a chart whose image intersects the goal. 

Definition 10 

Given an impasse I = (Po, Do, a » a o, Go) on a prob- 
lem, a chart f : P x -+ P is said to resolve the impasse 
if there is a monomorphism of Po into Pi and if the 
instance I can be solved in Pi. 

The main ideas of this paper can be summed up in 
the following. 

Proposition 11 Let {Po, ...P*} be a strategy for a 
problem P and let I denote the set of impasses for this 
strategy. Let {<3i, . . . , Q n } be a set of charts of P such 
that each impasse is resolved in at least one of the Q%. 
There is an atlas A based upon the charts Qi whose a s- 
sociatei meta-strategy gives impasseless strategies for 
P . 

The next section outlines a proof of this result and to 
a result on the length of the solutions that it produces. 

Solutions and Their Length 

The ideas required to construct the impasseless strat- 
egy are outlined below. The details have been worked 
out for the sliding tile puzzles, the Tower of Hanoi and 
the blocks world but in a manner that is somewhat 
problem dependent. Future work involves unifying the 
implementation so that it applies in a more problem 
independent way. 

Resolving the Impasses 

Let Pi = (S t) fttjO^a^Si+i) be an impasse arising 
from the strategy based upon the set {/i» •••»/*} °f 
features on a problem P. The following sequence of 
steps is used to resolve the impasse. 

SHRINK 

The goal of this step is to remove from consideration 
those features that are not required to construct the 
impasse. In general given a set of features on a prob- 
lem we can restrict to the moves that affect only these 


features. The required shrinking takes place by elim- 
inating the features which are both fixed and whose 
value does not figure in the creation of the impasse. 
ENLARGE 

Moves that effect the remaining features are now 
added to produce a subproblem in which the impasse 
can be solved. At each step the move added should 
affect the smallest possible number of additional fea- 
tures. 

An Example 

We can illustrate this process with the example of the 
fifteen puzzle. We saw that an impasse can be reached 
when the first three tiles have been placed. The 
SHRINK process reduces this to an example equiva- 
lent to a three puzzle in which the tiles appear in the 
order 3, x, 4, blank, when read clockwise from the top 
left hand corner, (x denotes one of the possible tile 
values other than those already used.) We can then 
EXPAND to a five puzzle, which can be either hori- 
zontal or vertical in which the impasse is resolvable. 

The next step is to determine whether there is an 
atlas for the problem whose charts are isomorphic to 
the set of subprobiems obtained by resolving the im- 
passes. If this is the case we then replace the original 
strategy by the following one. We suppose as before 
that we have a problem P with an adequate set of fea- 
tures {/i, ...,/*}• In addition we assume that there 
is an atlas A whose charts are isomorphic to the im- 
passe resolving subproblems obtained by the process 
outlined above. 

Using the same ordering of features that was used 
for the original strategy that produced the impasses. 

1. Set as the current subgoal the reduction of the next 
feature to its goal value. 

2. As each feature comes up for reduction find a chain 
of minimal length joining the current state to a state 
in the current subgoal. 

3. Extract the move sequence joining the current point 
to one in which the feature has been reduced. 

Since the atlas contains a resolution of all impasses 
this method will solve the problem whenever there is 
in fact a solution. 

The Length of a Solution 

We can now give an estimate for the length of a solution 
found using this method. We need some preliminary 
definitions. 

L will stand for the maximum chain length required to 
perform the reduction of a feature. 

D will stand for the maximum distance between two 
states in a chart. When a particular chart C is r 
ferred to we will use D(C) for the distance on th 
chart. Note that this number can be infinite if t: 
chart is an impasse chart. 
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N will be the number of features on the problem. 

n will be the number of chains required to reduce all 
the features. 

The first result is the following 

Theorem 12 LetP be a problem with an ample atlas 
and features with values of L,D,n as above . Then the 
algorithm given above finds a solution of length at most 

L x D x n. 

Proof. For each feature the length of chain required to 
reduce it does not exceed L , furthermore one each com- 
ponent of the chain the length of the move sequence 
required is less than D.O 

The corresponding result for abutment atlases is the 
following. The proof is similar. 

Theorem 13 Let P be a problem with an abutment 
atlas and features with values ofL^D^n as above . The 
algorithm above supplies a solution of length at most 
n x (L x D 4 * L — 1). 

Although these results are quite simple they give 
quite good estimates. For example in the case of the 
fifteen puzzle if we use the estimate of 22 as the maxi- 
mum distance on the five puzzle (Banerji 1990) we get 
an estimate of (22 x 3 x 15) 4- 3 for the length of a 
solution. A more perspicuous version of the argument 
yields (19 x 22) + 3. 

Summary and Conclusions 

This paper has presented a method for solving prob- 
lems that constructs the impasses associated to an ini- 
tial strategy in order to be able to find a new strategy 
in which impasses will not arise. 

The method can be applied to produce short solu- 
tions to the sliding tile pussies as well as to the blocks 
world. Though the implementation is at this stage still 
very problem dependent. Future work will produce a 
version that is more general. 

Acknowledgements I am grateful to Ranan Banerji 
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work. 
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Abstract 

The aim of changing representation is the improvement 
of problem-solving efficiency. For the most widely 
studied family of methods of change of representation 
it is shown that the value of a single parameter, called 
the expansion factor, is critical in determining (1) 
whether the change of representation will improve or 
degrade problem-solving efficiency, and (2) whether 
the solutions produced using the change of 
representation will or will not be exponentially longer 
than the shortest solution. A method of computing the 
expansion factor for a given change of representation is 
sketched in general and described in detail for 
homomorphic changes of representation. The results 
are illustrated with homomorphic decompositions of 
the Towers of Hanoi problem. 

Definitions 

The following definitions of the basic elements of 
problem solving are used throughout the paper. 
Only the definition of "solution" is non-standard. 

A state is an atomic object 
An action, or operator, is a function mapping 
states to states. 

A plan, or path, is a sequence of actions. 

A goal is a set of states. 

A problem is a pair consisting of an initial state 
and a goal. 

A problem space is a pair consisting of a set of 
states and a set of actions. 1 

A solution (of Problem <S0,G>) is a sequence 
SO- A1 -S 1 - A2-...-S(n- 1)- An-Sn 
where Si is a state, Ai is an action 


1 Thii definition uiumes ill sets of lUCI uc allowable 
as goals, that all State-Goal pairs are allowable as Problems, 
that all sequences of composable actions are allowable as 
Plans, etc.. This assumption is not essential to any of the 
results that follow. 


mapping S(i-l) to Si, and Sn is in the 
goal G. 

A solution plan (of Problem <S0,G>) is a 
sequence of actions mapping SO to a 
state in G. 


Change of Representation 

Problem space Pspace2 is a change of 
representation of problem space Pspacel if there 
exists a relation, R, between the two problem 
spaces that "preserves" solutions in the following 
sense. If R maps a problem. Problem 1, in 
Pspacel to Problem2 in Pspace2, and Solution2 
is a solution of Problem2, and R maps Solution2 
to Plant (a plan in Pspacel), then Plan! is a 
solution plan of Problem 1. This definition is 
depicted in the following diagram. 

Problem 1 > Problem2 

1 

i 

Plant < Solution2 

This is an extremely general definition, 
presupposing nothing about the nature of the 
indivudal mappings, nor anything about how 
problem spaces or the mappings are 
implemented. 

The net effect of change of representation 
is to "decompose" problem-solving in Pspacel 
into a three step computation: 

(1) translating a given problem into a problem 

in Pspace2 

(2) problem-solving in Pspace2, and 

(3) translating the solution back into Pspacel. 

Change of representation may be applied 
recursively to any of these three steps. Most 
commonly (for example in "hierarchical" 
problem-solving (Knoblock,1991)), it is applied 
repeatedly to step (2) until this step becomes 


trivial. Diagrammatically, this may depicted as: 
Problem 1 > Problem2 > ... > ProblemT 

4 

i 

Solution 1 < — Solution2 < — ...< — SolutionT 

The rightmost problem space in this 
diagram can always be assumed to be the trivial 
space consisting of one state and one operator 
(the identity). In this way the explicit problem- 
solving step is entirely eliminated: a problem is 
solved by being translated into the trivial 
problem space and then translating the solution 
back into the original problem space. The total 
cost of problem-solving, then, is the sum of the 
costs of the two translations. 


Solution Refinement 

Within the preceding general strategy for 
problem-solving by change of representation 
there are many possible variations. One of these 
variations, called "solution refinement” is the 
subject of this paper. Solution refinement is 
defined by the two following properties. First, 
the only complex computation is the translation 
of a solution in Pspace(K) to a solution in 
Pspace(K-l). This computation is called 
refinement Second, refinement preserves the 
structure of the solutions, in the following sense. 
Suppose the solution in Pspace(K) is 
S0-Al-Sl-A2-...-An-Sn 

A refinement of this solution must have the form 
XO-IO-RSO-RA 1 -X 1 -1 1 -RS l-RA2-...-Xn-In-RSn 
where 

RSi is a state in Pspace(K-l) corresponding to 
state Si, 

RAi is an action in Pspace(K-l) 
corresponding to action Ai and defined on 
RSi, 

Xi is the result of applying action RAi to 
state RS(i-l) (XO is the start state in the 
problem to be solved in Pspace(K-l)), 
and Xi-Ii-RSi is a solution to the problem of 
getting from Xi to RSi (if it happens that 
Xi=RAi, then Ii is empty). 

Every action in a solution has a counterpart in 
the refinement of the solution, and usually there 
will be new actions added (the non-empty Ii). 
Therefore, a refinement will usually be longer, 
and can never be shorter, than the solution it is 
based on. In other words, as the initial "trivial" 


solution is translated back to become a solution 
in Pspacel it grows longer and longer — it 
expands each time it is refined. The "expansion 
factor" (pp.10-11, (Stefik & Conway, 1982)) is 
defined as the average ratio of the length of a 
refinement to the length of the solution from 
which it was derived. An equivalent definition, 
which will be useful later, is that the expansion 
factor is the average number of states in the 
segments Xi-Ii-RSi. 

Solution refinement, in various forms, is 
the oldest and most widely studied method of 
change of representation. It is most often 
associated with the use of "abstractions", as in 
ABSTRIPS (Sacerdoti,1974), NOAH 
(Sacerdoti,1977), ALPINE (Knoblock,1991), and 
ABTWEAK (Yang & Tenenberg,1990). But 
solution refinement, as a strategy for problem- 
solving, is equally applicable to many ways of 
decomposing a problem space, not only to 
abstraction. For example, in our research 
(Zimmer et al.,1991), Pspace2 may be any 
refinable homomorphic image of Pspacel: that is, 
the mapping between Pspacel and Pspace2 may 
be any many-to-one mapping of states to states 
and operators to operators 2 such that* (1) the 
behaviour of operators is preserved, and (2) there 
exists a refinement of every solution in Pspace2. 
Examples of solution refinement and 
homomorphic decompositions are given in the 
next section. 


The Towers of Hanoi Problem 

Although there are several different ways to 
define the Towers of Hanoi problem space, in 
this paper we will follow the standard definition. 
A state is defined by naming the peg on which 
each of the disks sits. There are 3 pegs, and any 
disk may be on any peg, so if there are D disks 
there are 3 d states. An operator is defined by 
naming a disk and a direction (clockwise or 
anticlockwise): thus there are 2D operators, given 
D disks. The effect of an operator is to move the 
specified disk from its current peg to the next 
peg in the specified direction. An operator is 
defined on a state only if all the disks smaller 
than the disk to be moved are on the peg that is 


2 We are currently exploring the u«e of many-to-many 
mappings of operators, called "distributed representation!" in 
(Holte.1988). 
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not affected by the operator. 

In the 2-disk Towers of Hanoi problem 
there are 9 states, { <S,L> I S, L e {1,23}}, 
where S indicates the peg of the smaller disk and 
L the peg of the larger disk. The 4 operators are 
SC, SA, LC, LA, where S (L) indicates the 
smaller (larger) disk, and C (A) indicates 
clockwise (anticlockwise). 

In the 1-disk Towers of Hanoi problem, 
there are 3 states {<1>,<2>,<3>}, indicating the 
position of the lone disk, and two operators, C 
and its inverse A. 

The Standard Decomposition 

There are exactly four homomorphisms of the 2- 
disk space, as defined above, onto the 1-disk 
space. The standard one can be summarized in 
English as: "ignore the position of the smallest 
disk”. Formally, in this decomposition state 
<SJ-> is mapped to state <L>, operators LC and 
LA are mapped to C and A, respectively, and 
operators SC and SA are mapped to the identity. 

To illustrate the use of this decomposition 
in problem-solving, consider the problem in 
which the start state is <1,1> and the goal is 
<33>- This 2-disk problem is mapped into the 
1-disk space. The translated problem has <1> as 
a start state and <3> as a goal. The 1-disk 
solution is <l>-A-<3>. This solution implicitly 
has identity operators acting on states <1> and 
<3>. Refinement must now map this solution 
into a solution to the original problem. Operator 
A maps back uniquely to the operator LA, but 
the states do not map back uniquely, nor do the 
implicit identity operators. For example, the 
identity operators on state <1> map back to any 
sequence of operators which, when applied to a 
state in which the larger disk is on peg 1, lead to 
a state in which the larger disk is on peg 1. 

Our refinement algorithm works by 
translating a solution in Pspace(K), SolutionK, 
into a sequence of subproblems to be solved in 
Pspace(K-l). Each State-Operator fragment in 
SolutionK is translated into a goal to be solved 
starting at the final state of the previously solved 
subproblem (or, in the case of the first goal of 
this form, starting at the given start state of 
Pspace(K-l)). In the present example, SolutionK 
is <l>-A-<3>: this is translated into the goal 
”<1>-A”, whose meaning is "reach a state in 
which operator LA is applicable and the larger 
disk is on peg 1". Problem-solving commences 


from the start state in Pspace(K-l), <1,1>, and 
proceeds, as usual, until this goal is satisfied. In 
this case the solution is <1,1>-SC-<2,1>. This 
solution is the refinement of the <1>- segment of 
SolutionK. Note that the one state in SolutionK 
has been expanded into 2 states in this 
refinement: this expansion factor is the key in 
determining whether this decomposition will 
improve or degrade the efficiency of problem- 
solving. 

Continuing with the example, operator LA 
is added to the solution, along with the state 
(<2,3>) to which it leads from <2,1>. The state 
<2,3> will be the start state for the next 
subproblem in the refinement process. Because 
we have finished with all the operators in 
SolutionK, only the final refinement subproblem 
remains: the goal is to reach <3,3>, the goal state 
in Pspace(K-l). Problem-solving commences 
from the state <23> and finds the solution 
<2,3>-SC-<33>. This is the expansion of the 
-<3> segment of SolutionK: as before, there is an 
expansion factor of 2. The final solution to the 
original problem is created by linking together all 
the solutions to the refinement subproblems, 
giving <l,l>-SC-<2,l>-LA-<23>-SC-<33>- 

Non-Standard Decompositions 

In the first non-standard decomposition, state 
<S,L> is mapped to state <P> if S and L are both 
equal to P or if both are different than P. Thus, 
states <1,1>, <23>, and <33> map to state <1>, 
states <23>, <13>, <3,1> map to state <2>, and 
states <3,3>, <1,2>, <2,1> map to state <3>. 
Operators SA and LC both map to operator A, 
and SC and LA both map to C. 

In the second non-standard decomposition, 
state <SJL> is mapped to state <S>: that is, the 
position of the larger disk is ignored. Thus, 
states <1,1>, <13>, and <13> map to state <1>, 
states <2,1>, <23>, <23> map to state <2>, and 
states <3,1>, <3,2>, <3,3> map to state <3>. 
Operators LA and LC map to A and C, 
respectively, and SA and SC both map to the 
identity. 

In the final decomposition, state <S,L> is 
mapped to state <S-L+1>, where the subtraction 
is done modulo 3. In other words, the mapping 
is based on the relative positions of the two 
disks. States in which the two disks are on the 
same peg — <1,1>, <23>, and <33> — are 
mapped to state <1>. States in which the smaller 



disk is one peg "ahead" of the larger disk — 
<2,1>, <3,2>, and <1,3> — are mapped to state 
<2>. And states in which the smaller disk is one 
peg "behind" the larger disk — <1,2>, <2,3>, and 
<3,1> — are mapped to state <3>. Operators LC 
and SC both map to operator C, and operators 
LA and SA both map to operator A. 

When any of these decompositions is 
applied to the N-disk Towers of Hanoi problem 
space the resulting space is isomorphic to the 
(N-l)-disk space. Hence the same decomposition 
can be applied repeatedly to produce a sequence 
of successively smaller problem spaces ending 
with the trivial problem space. 


Problem-solving Efficiency 

The aim of all kinds of change of representation, 
including solution refinement, is to improve the 
efficiency of problem-solving. Consequently, it 
would be useful to be able to predict the change 
in problem-solving efficiency that would result 
by making a particular change of representation. 
Ibis ability would enable a system to select the 
most efficient among a set of possible changes of 
representation — for example, to select the best 
of the four decompositions of the 2-disk Towers 
of Hanoi problem. And, accompanied by an 
estimate of the problem-solving efficiency of the 
original problem representation, this ability 
would enable a system to determine whether any 
of the changes of representation is actually an 
improvement 

It is not difficult to analyze the efficiency 
of solution refinement methods under the 
assumption that the expansion factor at every 
level is the same. Let A be the number of 
nontrivial problem spaces, and X be the 
expansion factor. Then the length of the final 
solution is X A . If W[X] denotes the amount of 
"work” required to refine a single state-operator 
solution fragment, then the total amount of work 
required to create the final solution is 
W[X]*(X A -1)/(X-1). 

In his thesis (Knoblock,1991), Craig 
Knoblock observes that if X is a constant and A 
is proportional to the logarithm of the optimal 
solution length, then the work required by 
solution refinement is exponentially less than the 
work required by a brute force problem-solver in 
the original (undecomposed) problem space. 
These circumstances hold when the standard 


decomposition is used to solve Towers of Hanoi 
problems in which all disks are initially on the 
same peg. 

This formula for "work" provides a direct 
way to evaluate the efficiency of different 
decompositions of a problem space, providing 
that one can compute W[X] and measure the 
values of X and A for a given decomposition. In 
fact, the only real difficulty is the calculation of 
X. The number of non-trivial problem spaces is 
normally self-evident, and the term W[X] is 
almost always negligible compared to X A . Note 
that with the values of X and A we can calculate 
the expected length of a solution as well as the 
expected amount of work required to create it 

To see how to calculate X, recall that X, 
the expansion factor, is (by definition) equal to 
the average number of states in the segments 
"Xi-Ii-RSi” that are inserted during refinement. 
If the method used to change representation 
imposes constraints on the possible values of Xi 
and RSi, then these constraints may provide 
enough information to compute an expected 
value, or at least an upper bound, on X. For 
example, in homomorphic decompositions it must 
be the case that Xi and RSi are "equivalent", i.e. 
that they are mapped to the same state by the 
homomorphism. Given this fact, the expected 
value of X is simply the "average" length of the 
shortest path (operator sequence) between each 
possible pair of equivalent states. "Average" is in 
quotes because the actual probability of 
encountering each of the <Xi,RSi> pairs in 
practice is normally unknown. 

To illustrate this computation, consider the 
standard decomposition of the 2-disk Towers of 
Hanoi problem space. 9 different <Xi,RSi> pairs 
can be constructed Grom the 3 states that map to 
<1>. Of these 9 pairs, 3 are of the form <S,S>, 
3 are of the form <S,SC(S)>, and 3 are of the 
form <S,SA(S)>. The shortest path connecting S 
to S has a length (number of states) of 1, and the 
shortest path connecting S to SC(S) or SA(S) has 
a length of 2. The same analysis holds for the 
the states that are mapped to <2>, and for those 
that are mapped to <3>. Therefore the expected 
value of X, assuming all pairs of equivalent 
states are equiprobable, is (3*1 + 6*2)/9, or 5/3. 
This turns out to be impossibly low — in the N- 
disk Towers of Hanoi problem X must be larger 
than twice the Nth root of 2/3 — an indication 
that all pairs are not actually equiprobable. 
Nevertheless, this value may still be useful to 
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compare with the value of X computed in the 
same manner for the other decompositions. 


The three other decompositions all have the 
same expected value of X, namely 2.56. This is 
considerably larger than the value for the 
standard decomposition. Thus we expect that the 
standard decomposition will produce shorter 
solutions with less work than the other 
decompositions. To test this prediction, the 
standard decomposition and the second non- 
standard decomposition were used to solve all 
N-disk problems in which all disks are initially 
on peg 1. Work is measured as the number of 
arcs traversed by a breadth-first problem-solver 
before finding a solution 3 . The results of this 
experiment are: 


# of Disks 
(N) 

2 

3 

4 

5 


WORK 

Standard Non-Standard #2 

8.7 12.8 

25.4 47.1 

63.0 133.0 

142.0 328.0 


# of Disks 


(N) 


2 

3 

4 

5 


SOLUTION LENGTH 
Standard Non-Standard #2 

3.0 3.2 

5.7 6.7 

11.0 14.3 

21.7 29.9 


The ratio of successive solution lengths 
gives a true indication of the actual expansion 
factors of the two decompositions: X is optimal 
(slightly less than 2.0) for the standard 
decomposition and 2.1 for the non-standard 
decomposition. The difference in expansion 

factors is much smaller than predicted, but still 
results in a significant difference in solution 
lengths and the work required. 


If a formula is available to compute the 
expected amount of work required for problem- 
solving in the original (undecomposed) problem 
space, then this can be compared to the work 
formula for solution refinements to determine 
whether a given decomposition will improve or 
degrade efficiency. In the N-disk Towers of 
Hanoi problem space the expected amount of 
work is half the number of arcs in the entire 


3 Unlike the problem- »olver in Knoblock’f nalyiii, thii 
probiem-fotver never tnvenes the same arc twice. This sim- 
ple bookkeeping usually results in an exponential reduction in 
the work required. 


space (assuming that the problem -solver never 
traverses the same arc twice), which is given by 
the formula 3*(3 N -l)/2. Because this formula has 
the same form as the work formula for solution 
refinement, it follows immediately that a 
decomposition will degrade performance on the 
N-disk Towers of Hanoi if and only if its 
expansion factor is 3 or greater. 

In the same way that the work required 
with and without a change of representation can 
be compared, so too can solution length be 
compared. A breadth first problem-solver always 
finds a minimal length solution. In the N-disk 
Towers of Hanoi problem space, the minimum 
solution length, for the average problem in which 
all disks are initially on peg 1, is 
Comparing this to die expected solution length 
for solution refinement, it follows that a 
decomposition will produce exponentially longer 
solutions whenever its expansion factor is greater 
than 2. 

The fact that the critical value of the 
expansion factor is different for solution length 
and work-required leads to the apparent paradox 


that some 

decompositions 

will construct 

exponentially 

longer solutions and yet do 

exponentially 

less work. In 

fact, the second 

non-standard 

decomposition 

exhibits this 

phenomenon. 

as the following data shows (the 

experimental conditions are the same before). 

# of Disks 

WORK 

(N) 

Original Space 

Non-Standard #2 

2 

10.8 

12.8 

3 

37.8 

47.1 

4 

118.8 

133.0 

5 

361.9 

328.0 

# of Disks 

SOLUTION LENGTH 

(N) 

Original Space 

Non-Standard #2 

2 

3.0 

3.2 

3 

5.7 

6.7 

4 

11.0 

14.3 

5 

21.7 

29.9 
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Conclusion 
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Abstract 

This paper discusses our approach to representing 
application domain knowledge for specific software 
engineering tasks. Application domain knowledge is 
embodied in a domain model. Domain models are used to 
assist in the creation of specification models. Although 
many different specification models can be created from 
any particular domain model, each specification model is 
consistent and correct with respect to the domain model. 
One aspect of the system — hierarchical organization is 
described in detail. 

Introduction 

Creating, maintaining and evolving software systems 
requires an understanding of both programming knowledge 
and application domain knowledge. Programming 
knowledge is relatively well understood. It is formal, 
modeled in a variety of ways, explicit enough to be taught 
to novices, and general enough to apply across many 
domains. Although empirical field studies (Curtis, et al., 
1988) have shown that application domain knowledge is 
critical to the success of large projects, this knowledge is 
rarely modeled as needed. It is usually implicitly embodied 
in the application code rather than explicitly recorded and 
maintained separately from the code. Even when the 
knowledge is recorded, it is generally stored in voluminous 
natural language documents in an informal rather than a 
formal manner. Although problem-specific languages 
partially remedy this situation, they still capture domain 
knowledge in an ad hoc rather than a systematic manner. 
Furthermore, these languages are generally not designed in 
such a way that the results can be generalized or even 
replicated. 

Application domain models are representations of 
relevant aspects of application domains that can be used for 
different operational goals in support of specific software 
engineering tasks or processes. Domain models determine 
what there is in the world for reasoning about given 
application domains and sanction the types of inferences 
allowed. 

Operational goals are always implicit in die construction 
of a domain model and are essential to understanding the 
form and content of that model. Unlike generalized 
knowledge representation projects such as Cyc (Lenat, 
1990) that attempt to provide a basis for modeling 
encyclopedic knowledge, domain modeling explicitly 


acknowledges the commonly held view (Amarel, 1968) 
that representations are designed for particular purposes. 
These purposes-the operational goals-inherently bias any 
particular solution and dictate the final form of the model. 
As real-world domains are infinitely rich and diverse, we 
inevitably adopt particular perspectives in deciding what is 
relevant with respect to given tasks when formulating 
models (Liu and Farley, 1991). Even within the field of 
domain modeling, mi.ny different operational goals and 
modeling projects are being pursued (Iscoe, et al. 1991). 

In the next section, we give an overview of the domain 
modeling research at EDS and our corresponding 
operational goals. We then introduce a model 
reformulation concept— the generation of multiple 
specification models from a single domain model. The 
remainder of the paper focuses on one of the mechanisms 
which allows a specification designer to rapidly construct 
specification models that are consistent and correct with 
respect to the original domain model. 

Domain Modeling Research 

EDS specializes in creating software for a variety of 
industries. Each industry area such as utilities, finance, or 
health insurance has an associated body of knowledge 
which is critical to the understanding of specification and 
implementation of software systems. Domain expertise is 
acquired by personnel over a period of years, and the 
company is organized into strategic business units (SBUs) 
so that knowledge about a particular industry can be 
maintained over time. 

At the EDS Austin research laboratory, we are 
attempting to create a domain modeling system which can 
achieve the following operational goals: 

• Requirements & Specifications — Eliciting, verifying, 
and formalizing software requirements and specifications, 

• Program Transformation/Generation — Transforming a 
specification into efficient executable code, 

• Reverse Engineering — Identifying the semantics of 
existing code in terms of a partial specification, 

• Explanation, Education & Communication — Capturing 
and communicating application domain knowledge. 

The realization of these operational goals is consistent 
with our long-term plan for creating knowledge-based 
tools to support programming-in-the-large (Barstow, 1988) 
development. The domain modeling approach provides 
ample opportunities for investigating and creating new 
development paradigms. 
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Figure 1. Domain Modeling with Operational Goals 


Figure 1 illustrates the context in which we model. The 
industry knowledge for each SBU is instantiated into a 
domain model, which then serves as a source of knowledge 
for programs (the ovals) to achieve our operational goals. 
In the figure, the specification model (rectangle) contains 
the specification for a specific system within an application 
domain. Because one of our goals is to generate executable 
code, we require that any particular specification model be 
consistent. A very large but finite number of specification 
models can be created which are consistent and are correct 
with respect to a particular domain model. 



Figure 2. Instantiating Specification Models 


Figure 2 illustrates the two separate modeling tasks 
required by our approach. Domain experts interact with a 
system to store their knowledge in terms of a domain 
model. Specification designers then use the system to build 
specification models which satisfy constraints in the 
domain model. 

In order to create a specification model, the designer 
selects a set of relevant policies and constraints from the 
domain model that must be included and enforced in the 
specification model. The constraints include intra-attribute 
as well as inter-attribute relationships within and across 
entities relevant to the task at hand. 

Dynamic Knowledge Structure 

The remainder of this paper presents one aspect of our 
meta-model representation that is relevant to this 
workshop — dynamic restructuring of a hierarchically 
organized domain knowledge. 

While most would agree that hierarchical organizational 
strategies provide a reasonable way to structure knowledge 
within complex domains, the creation of a hierarchical 
structure, like any type of representational scheme, imposes 
a particular view of the world. Unfortunately, there is no 
particular view that is optimal for every application. 
Although the programs within a particular application share 
the same legal, physical, and economic constraints, the 
construction of any particular specification model depends 
upon a set of policy decisions that determine how cases are 
handled. Furthermore, software in the large systems are 
continually changing in such a manner that the concept of a 
static hierarchy is insufficient to capture the process of 
system evolution. 

Consider software systems that manage the payment of 
health insurance claims. Although conceptually simple, 
these systems handle hundreds of thousands of different 
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cases. One way to represent these cases is to enumerate the 
leaf nodes of the hierarchies created by the appropriate 
partitioning of attributes such as gender, age, family_status, 
previous.condition, employment, deductibles, copayments, 
prognosis, and so on. Unfortunately, the tree structure 
created by case expansion not only obscures relevant and 
interesting cases, but is also a monolithic structure. It is a 
paradox of object-oriented approaches that well-adapted 
structures are not adaptable to new situations. 

Because of the combinatorial explosion of the leaf 
nodes, it makes sense to handle the cases at as high a level 
as possible. Term subsumption systems such as CLASSIC 
(Borgida, et al. 1989) automate this process by determining 
the place in a hierarchy in which terms are subsumed. But 
subsumption systems assume a single structure in which all 
sub-models can belong. In the case of applications such as 
health insurance, individual modules may have different 
hierarchical structures and still maintain the integrity and 
constraint rules of the domain model. 

Attribute Definitions 

Attributes are normally considered as data values or slot 
fillers within a class or frame. However, the standard 
treatment of attributes as lists of data values with some 
underlying machine representation fails both to capture 
sufficient semantic information from the application 
domain and to state definitions with sufficient formality to 
allow semantics-related consistency checks. 

Attributes are functions which define how a set of 
objects is mapped within a class. One type of attribute has 
a value set represented by a nominal scale which consists 
of a set of values, *KA) = [C \ y . . . C n }. 

The semantics of an application domain are maintained 
by creating categories in such a way that items to be 
categorized with respect to a particular attribute are as 
homogeneous as possible within a category and as 
heterogeneous as possible between categories. Examples 
of nominal scales abound and map cleanly to the notion of 
enumerated type as shown below: 

(Colors 

: type nominal_scale 
lvalues (Red Yellow Green Blue) 

The next type of attribute is an ordinal scale — a nominal 
scale in which a total ordering exists among the categories. 
Interval and ratio scales are the more quantitative scales 
and add definitions of dimensions, units, and granularity. 

This brief description of attribute type was included to 
allow the reader to understand the examples in this paper. 
Attributes have additional types and a number of other 
properties which are explained in (Iscoe, et al 1992). 

Hierarchical Decomposition 

Hierarchies are a natural way to view and organize 
information and, at some level of abstraction, are a rvirt of 
most object-oriented and knowledge represer. ! ion 
languages. Unfortunately, the simplicity of these co _nts 
can sometimes obscure the semantics that a mou.i is 


attempting to capture. That one’s needs dictate one's 
ontological choice is a fundamental premise of knowledge 
engineering. The ability to systematically define a new set 
of attributes by partitioning the value sets of old attributes 
and then using these new attributes to reclassify the domain 
in accordance with the new requirements is a fundamental 
aspect of our attribute characterization. By preserving the 
’’ontological map" as a component of the attribute, the 
domain modeler can shift between the differing paradigms 
modeled by various classes of objects. 

Attribute characterization provides a representation and 
systematic methodology for the partitioning of attributes 
that facilitates the way they are organized, subdivided, and 
built into hierarchies. An attribute restriction is a new 
attribute whose value set and set of applicable relations are 
subsets of the original attribute. 

Creating a new attribute serves the dual purpose of 
creating a set of views on the old attribute as well as 
creating a new attribute. Often, new attributes are defined 
in terms of old attributes by partitioning the original value 
set and then equating each new attribute value with an 
element of the partition. As an example, an accounts 
receivable (AR) system may use the attribute 
days_to_payment whose value is the average number of 
days it takes for the client to pay a bill. 

(days_to__payment: 

.type ratio_scale 

dimension time 
:unit days 

imin 0 

imax 360) 


From the standpoint of AR applications, a more useful 
attribute might be : 

(type_of j>ayer: 

:type OrdinaLscale 

:Ordered_by lateness of paymen 

:values (pays_on_time slow_pay dead_beat)) 

This new attribute will be defined by partitioning the 
value set of days_to_payment, Vp by subdividing the 
range of values, then equating each value with one of the 
elements of the partition as illustrated in figure 3 and 
described as follows: 

(type_of _payer 

:mapped_from days_to_payment 
(pays_on_time (<=30) 

(slow_pay 

(AND (> 30) (< 90))) 


(dead-beat (>= 90)))) 



Figure 3 — Partitioning days_to_payment 
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Note that the days_to_payment attribute is based on a 
ratio scale while the type_oLpayer attribute is based on an 
ordinal scale. In general a defined attribute represents a 
loss of information (in this example, the number of days 
overdue) in return for a more useful and inherently less 
detailed category. 

Using Population Parameters 

Population parameters facilitate the formation of new 
attributes. For example, some graduate admissions 
committees use interval-scaled GRE scores to separate 
applicants into acceptance categories. Population 
parameters allow designers to create new attributes based 
on restrictions to the original attribute as shown below: 



Figure 4 — Using Population Parameters to 
Restrict an Attribute 

Figure 4 shows the GRE score as an attribute which 
could be attached to a student. Understanding the 
distribution of values within the value set of GRE scores 
allows application designers to create partitions in any one 
of a variety of ways. For example, assume that an 
application designer wanted to create an initial partition 
based on the requirement "accept all students who score in 
the top x% on the GRE, consider those who score between 
x% and y%, and reject those who score in the bottom y %. " 
Given this type of requirement, the domain model contains 
the appropriate information to use and an algorithm to 
produce the correct raw score numbers to achieve such a 
partition. 

Another way that these requirements are sometimes 
stated is to build a partition based on an absolute raw score. 
For example, a requirement like "accept all students who 
score above 1450 on the GRE" can be easily incorporated. 
Furthermore, this type of specification can be used 
interactively so that the designer can juggle between raw 
scores and percentiles until the partitions appropriate for 
the application domain are produced. 

Domain and Specification Models 

In this section we focus on relations between attributes 
within a single domain model class. For the purposes of 
this discussion we define the following attributes: 

(name :type identifier) 

(eye_co!or :type nominal_scale 
rvalues (brown, blue, green)) 

(Gender -.type nominal.scale 


rvalues (male female)) 

(Hysterectomy :type ordinaLscale 
rvalues (Y N)) 

(Medicare_payment rtype ratio_scale 
rdimension (money) 
runit (dollar) 

rgranularity (.01)) 

(Age_m typer ordinaLscale 

rvalues (under65 65_and_over) 

rmapped^from age 

(under65 (< 65)) 

(65_and_over (>=65))) 

Although other constraints exist, domain model classes 
can be regarded as consisting of sets of attributes which are 
either required or might be included within a particular 
domain model. These constraints are expressed as 

follows: 

must_have(c , a, cond) — attribute a must be used 
in class c in a model if condition cond evaluates 
to true. 

applicable (c, a , cond) — attribute a can be used in 
class c a model if condition cond evaluates to 
true. 

Within any particular specification model, an attribute is 
simply classified as used within a class. 

used(m , c, a, cond) — within model m, attribute a 
is used in class c in model m if condition cond 
evaluates to true. 

The most straight-forward relationship between a 
domain model and a specification model is that must .have 
attributes are used in all specification models and 
applicable attributes are selected by the specification 
designer. 

must_have(c, a, cond) Vm used(m, c, a, cond) 
applicable^, a, cond) <-»3m used(m, c, a, cond) 

thus 

mustjiave(c,a, cond) -» applicable^, a, cond) 

For example, in a domain model, name might be 
required for all specification models, while eye_color could 
be selected only if it were appropriate for the particular 
specification model. 

(person 

rmustjiave ((Name ()) 
rapplicable ((eye_color 0) 

...) 

The application of these constraints when cond is 
vacuously true is fairly standard feature in most modeling 
languages of this type. However, name and eye_color are 
attributes which are total and are not as interesting as the 
cases that occur when the attributes are partial functions. 

Conditions for Function Evaluation 

Recalling that an attribute is a function which maps 
objects to a particular property, cond can be interpreted as 
the condition which must be satisfied for the attribute to be 
a total instead of a partial function. In other words, cond 
defines the subset which is the domain of applicability of 
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the partial function. For example for a person class 
hysterectomy is only applicable if the gender is female, 
(applicable person Hysterectomy 
(= Gender female)) 

The domain modeling system is designed so that the 
conditions required to establish the proper domain for an 
attribute are automatically maintained. These conditions 
are constrained in such a way that tractability is maintained 
and are of the form ((pj aj vj)I... (p n a n v n )) , where p, is 
the name of a predicate, a[ is the name of an attribute, and 
v, is a value of the attribute. 

When conditions exist, the following axiom is needed: 
(applicable c a condl) 

[(used m c a cond2) (condl -> cond2)] (1) 

A user can create a specification model with any 
particular class hierarchy as long as the domain policies 
and constraints are satisfied. 

Domain and specification model consistency is 
maintained by a specialized theorem prover. The theorem 
prover, STR+VE , is an upgraded version of the prover 
presented in (Bledsoe 1980) for proofs of theorems in 
general inequalities. A TMS is being constructed to 
interface between the modeling system and the theorem 
prover 

We are currently experimenting with ways to capture 
and verify domain modeling constraints by presenting 
redundant information in a variety of ways. We believe 
that many of the specification problems in large systems 
are created when value set changes cause a single case to 
be changed but fail to correct cases that were identified 
from a previous inference. 

For example, if we assume that hysterectomy is 
applicable to females, the system can infer that 
hysterectomy cannot apply to males by using axiom 1, the 
definition of applicable, and the definition of gender to 
derive a contradiction. 

applicable^, a, cond) <-> 3m used(m, c, a, cond) 

applicable(P, hys, [(= gender m)]) 

-»(= Gender, M) -»(= Gender, F) 

(= Gender, M) -.(= Gender, F) 

A key point is that when people are presented with value 
sets they automatically and unconsciously perform 
substitutions such as the ones listed above. This is a 
reasonable way to build a model until a value set changes. 
In large systems, value sets are frequently changed. 
Consequently, conclusions that were drawn by using 
negation to infer values become invalid. We use the 
applicability of conditions and the system’s knowledge of 
value sets to attempt to provide the proper cases for the 
domain modeler to check when condtions change. 

Discussion 

In this paper, we have presented the concept of modeling 
application domains in order to achieve the operational 
goals of program specification, code generation, and 
reverse engineering. The main concept is that multiple 
specification models can be created that are consistent and 
“correct” with respect to a domain model. Domain models 


represent information about a particular industry area. 
Specification models represent information about a 
particular system. 

Domain and specification models are constructed by 
using a graphical interface to interactively create a set of 
rules based on attribute value set partitions and the 
preceding axioms. The system is being implemented using 
Motif GUI on SPARC workstations. Although it is 
currently operating in a single user mode, it is being 
designed to be accessed simultaneously by multiple domain 
modelers. We are also trying to accelerate the knowledge 
capture process by reverse engineering data models that 
have been captured by an existing EDS case tool and 
instantiating them into the appropriate domain models. 

References 

Amarel, S. 1968. “On Representations of Problems of 
Reasoning About Actions,” in Machine Intelligence 3, D. 
Mr rue. Ed., American Elsevier, New York pp. 131-171. 

rsiow, D. 1985. “Domain-Specific Automatic 
Programming,” IEEE Transactions on Software 
Engineering , vol. SE-11, no. 11, pp. 1321-1336. 

Barstow, D. 1988. " Artificial Intelligence and Software 
Engineering," in Shrobe, H., ed.. Exploring Artificial 
Intelligence. AAAI. Morgan Kaufmann, San Mateo, CA. 

Bledsoe, W. W., and Hines, L. M. 1980. "Variable 
Elimination and Chaining in a Resolution-Base Prover for 
Inequalities," Proceedings of the 5th Conference on 
Automated Deduction , Les Arcs, France, Springer-Verlag, 
pp. 70-87. 

Borgida, A., Brachman, R.J., McGuinness, D.L., and 
Resnick, L.A. 1989. "CLASSIC: A structural data model 
for objects," in Proceedings of the 1989 ACM SIGMOD 
International Conference on Management of Data, pp. 59- 
67. 

Curtis, B., Krasner, H. and Iscoe, N. 1988. “A Field 
Study of the Software Design Process for Large Systems ” 
Communications of the ACM , vol. 31, no. 11, pp. 1268- 
1287. 

Davis, R. 1991. “Knowledge Representation: 
Broadening the Perspective,” AAAI-91 Panel, Anaheim, 
CA. 

Iscoe, N, Browne* J.C., Werth* J. , and Liu, Z.Y, 1992 
“Attributes - Building Blocks for Modeling Application 
Domains,” Submitted to IEEE TSE 

Iscoe, N., Williams, G. and Arango, G., Eds. 1991. 
Domain Modeling for Software Engineering , Proceedings 
of Domain-Modeling Workshop , Austin, Texas. 

Lenat, D.B. , Guha, R.V., Pittman, K., Pratt, D., and 
Shepherd, M. 1990. “Cyc: Toward Programs with 

Common Sense,” CACM f vol. 33, no. 8, pp. 3049. 

Liu, Z.-Y. and Farley, A. 1991. “Tasks, Models, 
Perspectives, Dimensions,” The 5th International 
Workshop on Qualitative Reasoning Austin, Texas, pp. 1- 
12 . 


no 



Knoblock 


Research Summary 

Craig A. Knoblock 

University of Southern California 
Information Sciences Institute 
4676 Admiralty Way 
Marina del Rey, CA 90292 
knoblock@isi.edu 



Reducing search in problem solving is a central issue 
in building systems to solve complex and interesting 
problems. One approach to reducing search is through 
the use of abstraction. My research has focused on 
three closely related issues: identifying the properties 
that comprise a useful abstraction, developing tech- 
niques for automatically generating abstractions, and 
making effective use of these abstractions in problem 
solving. 

An abstraction space is formed by ignoring details 
of a problem space. An important property of an ab- 
straction space is that a plan produced in that space 
can be refined without undoing the work performed in 
the abstract space. [Knoblock et ai , 1991b] provides a 
formal characterization of this property, called ordered 
monotonictiy , which forms the basis of an algorithm 
for generating abstractions spaces. 

Based on the ordered monotonicity property, I im- 
plemented a system called ALPINE that automatically 
generates abstractions for problem solving [Knoblock, 
1990b, Knoblock, 1991a]. The system takes both a 
problem and an initial problem space and produces a 
hierarchy of abstract problem spaces that is tailored to 
the particular problem to be solved, alpine produces 
abstractions in a variety of domains [Knoblock, 1990a, 
Knoblock, 1991a, Knoblock, 1992] and then uses them 
for hierarchical problem solving. 

The abstractions generated by alpine produce sig- 
nificant performance improvements [Knoblock, 1991b]. 
The hierarchical problem solver is implemented as an 
extension to the PRODIGY system [Carbonell et ai , 
1991], making it possible to experiment on existing 
domains and both combine and contrast the use of 
abstraction with other types of learning. [Knoblock 
et ai , 1991a] describes the integration of alpine with 
explanation-based learning. We also plan to integrate 
alpine with the learning by analogy component in 
PRODIGY. 
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Abstract 

There has been much recent work on the use of 
abstraction to improve planning behavior and cost. 
Another technique for dealing with the inherently 
explosive cost of planning is localization . This pa- 
per compares the relative strengths of localization 
and abstraction in reducing planning search cost. In 
particular, localization is shown to subsume abstrac- 
tion. Localization techniques can model the various 
methods of abstraction that have been used, but 
also provide a much more flexible framework, with 
a broader range of benefits. 


1 Introduction 

Over the years, several research results have ap- 
peared on the use of abstraction to guide and im- 
prove planning performance [2, 3, 4, 5, 11, 12]. Ab- 
straction techniques restructure a problem and the 
problem-solving process into a set of “abstraction 
levels.” At the top level of abstraction, the prob- 
lem is described and solved at the most coarse- 
grained level of detail. Each successive level is made 
more concrete than its predecessor by incremen- 
tally adding information into the problem descrip- 
tion. The use of abstraction can benefit planning if 
the solution found at an abstract level serves as a 
good starting point for problem-solving at the next 
level of detail. Thus, abstraction may be viewed as 
a heuristic for ordering which pieces of the overall 
planning problem are solved first, and which later. 
At least two methods have been used within the 
planning community for creating levels of abstrac- 
tion: (1) creation of more concrete levels of detail 
by incrementally decomposing abstract actions into 
more concrete subactions ( operator abstraction); and 
(2) creation of more abstract levels by incrementally 
eliminating required action preconditions (state ab- 
straction ). 


Recent work has also appeared on the use of do- 
main localization or decomposition to structure a 
problem description and thereby guide and improve 
planner performance. In this case, search savings 
are attained via a “divide and conquer” approach 
to reasoning. A domain and problem description 
(its actions, definitions, goals, preconditions, and 
any other constraints or properties) are divided up 
into regions. Semantically, regions define the precise 
“scopes of interaction” between domain properties 
and actions. Each region consists of a subset of the 
overall set of actions and the various properties and 
goals that pertain to those actions. 1 The localiza- 
tion structure of a domain is then used to break the 
planning space into a set of smaller reasoning spaces 
(each constructing a plan for a particular region) 
and to determine how these spaces are searched. In 
[9], a localized search algorithm is described, along 
with analytical and empirical results that demon- 
strate how exponential savings in search cost can be 
achieved. 

This paper compares the relative strengths of lo- 
calization and abstraction as heuristics for reduc- 
ing planning search cost. In particular, localization 
is shown to subsume abstraction; localization can 
model abstraction “levels,” but also provides a more 
flexible framework for domain partitioning, with a 
broader range of planning benefits. Section 2 begins 
with a characterization of the planning search space 
and the relative search benefits achievable via local- 
ization and abstraction. Section 3 then provides a 
description of the localized reasoning frameworks of 
two planners - GEMPLAN [6, 7, 8, 9] and COL- 
LAGE, a new system that builds upon the ideas in 
GEMPLAN. Analytical and empirical results that 
describe the cost savings attainable by utilizing lo- 


1 Problem reduction (translation of a goal into subgoals) 
may also be used to decompose a planning problem [1]. How- 
ever, this kind of technique may more properly be viewed 
as a problem solving method rather than a search reduction 
technique, though search savings may occur as a result. 
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calization are also summarized, as well as tradeoffs 
in its use. Next, Section 4 shows how localization 
can encode the various commonly used methods of 
abstraction. Finally, Section 5 concludes with fur- 
ther discussion of the strengths and weaknesses of 
the two techniques. 

2 The Planning Search Space 

Consider a search space in which each node is asso- 
ciated with a plan and each arc is associated with a 
plan-construction operation that transforms a plan 
into a new plan (typically via the addition of actions, 
relations, or variable bindings). Such a tree directly 
reflects plan-space search, but can also be mapped 
onto state-space search. In the latter case, the plan 
P associated with a node is mapped onto the “state” 
reached after executing P, and each arc operation is 
mapped onto the action (i.e., the reasoning the plan- 
ner must perform in order to add that action) that 
takes it from one state into the next state. Given this 
characterization of planning search, we can see that 
the cost of both state-based or plan-based search can 
be improved in at least three ways: 

1. Lowering operation cost - i.e., reducing the 
cost of each arc or plan-construction opera- 
tion. Since most planning algorithms are NP- 
compiete in the size of the plan, reducing plan 
size is one way of lowering operation cost. 

2. Operation ordering - i.e., choosing a good or- 
der in which to apply plan-construction operas 
tions. Goal-ordering is one example of this, as 
are other heuristics for determining how a plan- 
ning space is searched. A good operation or- 
dering may result in less backtracking, but may 
also improve solution quality. 

3. Reducing implicit search space size , typically 
by lowering the branching factor of the search 
space. Of course, decreasing necessary back- 
tracking via operation ordering may reduce how 
much of a space is actually searched. But limit- 
ing the applicable operations at each node ab- 
solutely reduces the total size of the space. 

Both localization and abstraction may be viewed 
as problem-solving heuristics for reducing planning 
search cost. Alternatively, they may be viewed as 
ways of reformulating or recasting a planning prob- 
lem so that the cost of search required to solve that 
problem is reduced. Abstraction techniques explic- 
itly break the problem-solving process up into “ab- 
straction levels.” At each level, more information is 
added into the problem definition (e.g., actions are 
decomposed or preconditions are added) to create a 


more complex planning problem. Since abstraction 
levels inherently control the order in which pieces of 
the problem are tackled, it is a heuristic for operation 
ordering. In earlier stages of the reasoning process, 
only “higher” level operations, which involve high- 
level actions or conditions, are applied. This set is 
expanded as the problem and domain definition is 
expanded. Although abstraction also initially lim- 
its the set of applicable operations at each search 
node, an inherent reduction of applicable operations 
is not a guarantee of the abstraction technique once 
the domain is fully expanded. Rather, it is the job 
of abstraction-derivation techniques to form abstrac- 
tion hierarchies that guarantee properties like mono- 
tonicity [4], which limit interaction between the ac- 
tions and states of the various abstraction levels. 

Rather than dividing a problem definition into ab- 
straction levels, localization divides a problem ac- 
cording to the inherent scope of its actions, proper- 
ties, and goals. A particular localization or domain 
decomposition provides a planner with a semantic 
definition of the scope of all domain actions and 
properties. Each region may be viewed as a “scope” 
of reference, with an associated set of actions, defi- 
nitions, goals, etc. As a result, a localization can be 
used to determine which domain actions and prop- 
erties interact, and which are independent. Local- 
ization then forms a valid basis for partitioning the 
planning search space into a set of smaller spaces 
(one for each region), for focusing the application of 
plan-construction operations to specific pieces of the 
plan, and for triggering those operations at appro- 
priate times. 

Moreover, unlike abstraction, localization can be 
used to encapsulate domain information based on 
any criterion, not just “abstractness.” The re- 
gion divisions are based on the particular qualities 
and scopes of the domain rather than a particu- 
lar “abstraction- inducing” technique such as oper- 
ator or state abstraction. Thus, abstraction-based 
localizations might be used, but also physically- 
based, process- based, or temporally-based partition- 
ings, which may be more compelling. 

Finally, and perhaps most importantly, localiza- 
tion allows for domain regions that overlap and in- 
teract. While it is often difficult to attain a clean 
partitioning into abstraction levels (often resulting 
in a collapse of levels or a great deal of interac- 
tion between levels), localization embraces the no- 
tion that real-world decompositions cannot be neatly 
decomposed and will naturally entail regional over- 
lap. Thus, the localization technique explicitly pro- 
vides methods for coping with regional interaction. 

In terms of the potential search benefits described 
above, localization can achieve ail three: 
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1. Plan-construction operations are applied to 
much smaller regional plans. Thus, localization 
reduces operation cost While abstraction may 
provide a way of initially working on smaller 
plans at higher levels of abstraction, ultimately, 
the scope of the planning algorithms becomes 
the most detailed plan. Thus, abstraction does 
little to partition the scope of reasoning and 
does not inherently improve operation cost. 

2. The localized search technique directs search 
flow so that only “relevant” operations are ap- 
plied at each point in the reasoning process (i.e., 
those operations relevant to regions whose plans 
have been modified). Thus, localization con- 
trols operation ordering . 

3. Since only region operations are applicable at 
each region search node, localization reduces 
search space size by limiting the branching fac- 
tor at each search node . 

3 Localized Representation 

and Reasoning 

We now explain the localization technique by de- 
scribing its instantiation in two localized planners, 
GEMrLAN and COLLAGE. In both systems, a 
region R is defined by a region description: 

< actions(R ), dcfinitions( R ), con$traint$(R),subregion*(R)> 

Each region R is associated with a search tree, 
tree(R) y whose role is to construct a plan, plan(R), 
that satisfies all regional constraints, given available 
actions and definitions. Each plan is a partially or- 
dered set of actions. The set actions(R) defines 
a set of action types which are considered to be- 
long directly to R and instances of which may occur 
within plan(R). (Note that plan(R) may also in- 
clude actions belonging to subregions of R.) The 
set definitions(R) includes any definitions pertain- 
ing to activity in plan(R). The set constraint s(R) 
includes “constraints” that must be satisfied by 
plan(R ). Finally, subregions(R) consists of regions 
belonging to R? 

The regions comprising a domain may take on any 
structural configuration - they may be disjoint, form 
hierarchies, or even overlap. Semantically, a particu- 
lar decomposition defines the scope of domain prop- 
erties; the scope of each definition and constraint 
associated with R is plan(R) - which may be com- 
posed only of actions in R and its subregions. It 
is the role of the domain describer to ensure that 

2 Section 3.1 describe* the relationship between “action*, 
definitions, and constraints” and more traditional planning 
representations. 


these scoping semantics are correct; the planner as- 
sumes that they are. The only required criterion for 
domain decomposition is that each constraint and 
definition belong to a region that includes at least 
the entire “scope of applicability” of that definition 
or constraint (but possibly more). 

As an example, consider the small construction 
domain depicted in Figure 1. It has been partitioned 
into regions that include the activities of an electri- 
cian and plumber. These regions include subregions 
that contain activities at specific walls. Each wall 
region would be associated with definitions and con- 
straints that are relevant only to the actions that can 
take place at that wall. In contrast, electrician (or 
plumber) definitions pertain to all activity directly 
within e; trician ( plumber ), as well as all activ- 
ity at it- vail subregions. Since wall A is shared 
by the trician and plumber, both the electri- 
cal and mbing constraints apply to the activity 
within i . aiA . The constraints directly associated 
with wail A itself would probably include those re- 
lating to coordination of the plumber and electrician 
activities at that wall. The figure also shows search 
trees for these regions. Each tree is concerned with 
building a plan for its region that satisfies all re- 
gional constraints. The planning process may thus 
be viewed as a set of “mini-planners,” tied together 
by the structural relationships between regions. 


electrician 
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wallA 


plumber 
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electrician plumber 
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Figure 1: A Localized Construction Domain 

3.1 Localizing Traditional Planning 
Representations 

One important distinction between the planning rep- 
resentation of GEMPLAN and COLLAGE and 
that of traditional planners is the encoding of do- 
main information in terms of “actions,” “defini- 
tions,” and “constraints” rather than STRIPS-like 
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operator descriptions. One reason for this is that it 
allows domain information to be more easily local- 
ized. In a traditional planning representation lan- 
guage, an “action description” is bound up with ac- 
tion preconditions and effects. The “definition” of a 
particular state predicate is essentially a side-effect 
of the set of action descriptions within the domain 
and is thus “distributed” throughout the domain 
description. Whether or not a literal P is true at 
some point in the plan is determinable by examining 
the actions within the plan, along with their defined 
preconditions and effects, and seeing whether they 
“combine” to achieve P. The goals that a traditional 
planner attempts to achieve are a combination of 
user-provided top-level goals and subgoals that are 
posted to fulfill action preconditions. 

In contrast, the GEMPL AN/COLLAGE frame- 
work separates the definition of actions from their 
preconditions and effects. Action-type definitions are 
simply descriptions of the action types themselves - 
an action “name” with a set of parameters. For in- 
stance, in a blocks world domain, pick(block) would 
define an action type, an instance of which is pick(a). 
A state predicate is defined separately by an explicit 
predicate definition . In the GEMPLAN implemen- 
tation of the blocks world, the following definition 
of clear(B) is used: 3 

strips.def inition(clear(B) , 

[adder (pick(Y) .on(Y.B)) , 
adder (put (B,_) .true) , 
deleter(put(_,B) .true) , 
deleter(pick(B) .true)] ) . 

A predicate definition includes a list of conditional 
adder and deleter descriptions. The first parameter 
of “adder” or “deleter” is an action type which adds 
or deletes the predicate, under the condition in the 
second parameter. For example, an action of type 
pick(Y) adds clear(B) if on(Y, B) is necessarily true 
just before it, put(£,-) always adds clear(B ), and 
put (., B) or pick(B) always delete clear(B). Sepa- 
rating predicate definitions from actions descriptions 
allows actions and predicates to be individually lo- 
calized (see Section 4). It also makes conditional 
effects easy to describe; for example, that an action 
adds a particular literal P in some contexts and an- 
other literal Q in others. 

In the GEMPL AN/COLL AGE framework, ac- 
tion preconditions and top-level goals are also de- 
scribed as separate entities - they are explicitly de- 
fined as constraints. For example, in the blocks 
world domain description, we have: 

3 Capitalized tokens (or the character represent vari- 
ables. Lowercase is used for constants. Thus, notation of the 
form pick(X) or pick(-) is used to denote any pick action with 
a single parameter. 


constraint (precondition (pick (B) , clear (B)) 
constraint (precondition(put (X.B) .clear (B)) 

Such constraints can be easily localized. Also, note 
how the separation of precondition constraints from 
predicate definitions clearly distinguishes between 
necessary action preconditions and those conditions 
utilized only for describing conditional effects. 

Given a framework of actions, definitions, and 
constraints, planning may be viewed as “constraint 
satisfaction” rather than backwards and/or forwards 
chaining on state-based goals and conditions. In 
GEMPLAN and COLLAGE, a “constraint” is 
simply any property that the planner knows how 
to test and make true. The standard STRIPS-based 
algorithms form only a subset of the possible meth- 
ods of plan construction in GEMPLAN and COL- 
LAGE - many other kinds of constraint forms and 
satisfaction algorithms are provided by the two sys- 
tems. Any of these constraint forms may be used 
to encode domain properties, and all constraints are 
appropriately scoped by the localization structure of 
a domain. Thus, in many ways, both systems may be 
viewed as general constraint- based reasoners rather 
than strictly as planners. 

3.2 Localized Search 

Once a domain has been localized, its regional struc- 
ture guides how localized search is performed. As 
described earlier, each tree(R) is concerned with 
constructing a plan(R) that satisfies constraint s(R) 
given actions(R), definitions(R), and the actions 
and definitions of all subregions (and subsubregions, 
etc.) of R. Each tree node is associated with the re- 
gion plan constructed up to that point in the search, 
and each tree arc is associated with a plan modifi- 
cation that transforms a region plan into a new re- 
gion plan. Upon reaching a node, the planner must 
choose which region constraint to check next. (Thus, 
an implicit branching factor in the search space is 
the set of all region constraints at each node.) If the 
chosen constraint is not satisfied by the plan at that 
node, constraint satisfaction algorithms must be ap- 
plied, resulting in a set of new region plans at the 
next level down in the tree. A constraint satisfac- 
tion algorithm typically adds new actions, relations, 
and variable bindings to a region plan, and may also 
generate new subregions. For example, in order to 
satisfy a precondition constraint, one option is to 
add an action and appropriate relations that estab- 
lish that action as an “adder” of the precondition. 

Because it is partitioned into regional search trees, 
localized search is more complicated than the tradi- 
tional global search utilized by most planners. The 
localized search algorithm described in [9] has two 
basic functions: (1) global correctness : making sure 



115 


I 


that all constraints that need to be checked are 
checked and that appropriate shifts occur between 
between regional search spaces; and (2) global con- 
sistency: making sure that all of the plan fragments 
being constructed (especially those shared by more 
than one super-region plan) are consistent with each 
other. This second function is much like that of a 
distributed database and is ensured by updating all 
relevant plans for ancestor regions of R, each time 
search exits from tree(R). Global correctness is 
ensured, first, by making sure that all regions are 
searched at least once, and second, by making sure 
that search eventually occurs for a region R when- 
ever fTs plan haw been affected by some previous 
plan modification. GEMPLAN uses a fixed strat- 
egy for controlling search flow and consistency main- 
tenance, but COLLAGE allows for more flexible 
approaches. 

In some senses realized search control may be 
viewed as a TMS ...e strategy for maintaining con- 
straint satisfaction - only “affected” constraints 
need to be rechecked. Unlike a true TMS, however 
(which also tries to capture “what affects what”), 
domain localization is a broad-brush heuristic strat- 
egy that need not be accompanied by perpetual 
and expensive reasoning to update those dependen- 
cies. The domain decomposition provides a “cut” at 
defining scope and interactions; the planner uses it, 
but never needs to verify it or update it. In this re- 
spect, localization provides the same level of heuris- 
tic information as abstraction, providing a “useful” 
partitioning of domain information. However, local- 
ization can encapsulate information based on many, 
perhaps mixed, criteria. Some regions may cap- 
ture physical structures, others may reDresent pro- 
cesses, and others may represent abstraction hier- 
archies within these or overlaid with i nese. For in- 
stance, the construction domain of Figure 1 includes 
regions that are physically- based (the walls) as well 
as those that represent contractor “processes.” One 
might view localization as having the ability to cap- 
ture both “horizontal” as well as “vertical” decom- 
position. 


3.3 Localized Search Benefits and 
Tradeoffs 

In [9] a detailed complexity analysis is provided that 
highlights the potential benefits and tradeoffs of lo- 
calized search. That paper also provides some ini- 
tial empirical results that support this theoretical 
analysis. This section summarizes these benefits and 
tradeoffs. 

Since the cost of localized search for a partic- 
ular domain is very dependent on the particular 
constraints, structure, and problem specification for 


that domain, the “general” complexity analysis de- 
scribed in [9] was performed on a somewhat ideal- 
ized domain scenario. The search cost of a global, 
non-localized domain was compared with that of the 
same domain, partitioned into a set of m subregions 
each of which overlaps by some factor k with an- 
other region g. An original set of n c constraints was 
partitioned among these m+ 1 regions. Table 1 sum- 
marizes provides the results of this analysis. Com- 
plexity results were calculated assuming that all con- 
straints were either constant, linear, quadratic, or 
exponential in cost relative to the size of the plan. 
The table also compares the cost of best-case or 
worst-case search. Best-case measures the cost of 
one path through the search space (no backtrack- 
ing), and worst-case measures the cost of the entire 
potential space. The terru. i is the size of the final 
plan. The term n/ is the number of potential fixes 
for each constraint. Finally, C is the cost of main- 
taining consistency, which is assumed to be 0(m 2 k). 

These results show that localized search is nearly 
always better than non-localized search - in most 
cases much better. The only exceptions are 
constant-complexity best-case search (when there is 
no reduction in the amount of the space searched 
nor in constraint algorithm cost) or when the cost 
of consistency maintenance overshadows the cost of 
the search. The amount by which localized search 
wins over non-localized search is proportional to m 
(the amount of decomposition), but inversely pro- 
portional to mk (the amount of overlap). Thus, in- 
creased decomposition is always worthwhile, except 
for the cost of increased overlap. The gains of local- 
ized search become exponential as the complexity of 
the constraint algorithms increases and the amour" 
of the space actually searched increases. These gai 
come from three sources, which correspond direct i > 
to the three factors described in Section 2: 

1. The cost of each arc - i.e., operation cost Even 
if the absolute size of the non-localized and 
localized search spaces are the same, expen- 
sive constraint algorithms are applied to much 
smaller plans in the localized case. 

2. The search heuristics provided by localization - 
i.e., operation ordering . Because of the seman- 
tic information provided by a localized domain 
description, the most relevant constraints tend 
to be applied at the right time, enabling a re- 
duction in the amount of the search space that 
actually needs to be searched. 

3. The size of the search space - i.e., branching 
factor reduction . This is because only regional 
constraints are relevant at each node. 
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complexity of 
c(i) and f(i) 

Non- Localized 
( beat-case ) 

Localized 

(beat-case) 

Non-Localized 
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constant (6) 
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s 2 (n c n/) J 


exponential (6* ) 
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(bn c n f y 



Table 1: Complexity Results 


Empirical tests were also carried out which bol- 
ster these results. In [9], several decompositions 
of a building-construction domain were compared, 
as well as the effects of varying the size of the ac- 
tual building plan constructed. In this domain, con- 
straint cost was fairly low (close to linear for most 
constraints) and there was no backtracking. Even 
so, the search cost of the best localization was less 
than 50% of the non-iocalized domain configuration. 
The results also show that increased localization pro- 
vides increased benefit, except for the added expense 
due to increased regional overlap. However, as also 
predicted by the complexity results, the detrimental 
effects of increased overlap become overshadowed as 
plan size and search space size increases. 

One of the focuses of the COLLAGE project is 
to flesh out our understanding of localized search by 
performing many more controlled experiments. The 
new COLLAGE search control architecture features 
a constraint-activation and consistency-activation 
agenda mechanism that allows for various aspects 
of the search strategy to be easily modified and re- 
configured. Using this architecture, we plan to test 
a suite of search strategies over a suite of prob- 
lem types that vary in the amount of backtrack- 
ing required, constraint algorithm difficulty, as well 
as domain localization structure and problem size. 
Finally, we also hope to come up with a localiza- 
tion learning approach that automatically discov- 
ers domain-dependent and domain-independent lo- 
calization heuristics. 

4 Modeling Abstraction With 
Localization 

In order to model traditional planning- based ab- 
straction methods in a localized framework, we must 
create “levels” of reasoning, representing incremen- 
tally more detailed descriptions of the domain. Re- 
ferring to the characterization of a region description 
in Section 3, we can see that this can be achieved by 
incrementally adding regions and/or subregion links, 
constraints, action types, and definitions. The ad- 
dition of this information can be done by a special 
search step that introduces the next “level” of rea- 
soning. Both GEMPLAN and COLLAGE already 


incrementally add regions during planning, and both 
systems access relevant domain information in such a 
way that makes incremental addition of other types 
of information trivial to accommodate. 4 In addition, 
the incremental addition of subregion containment 
relationships adds an interesting “twist” to the types 
of abstraction levels attainable; a region may be ini- 
tially visible to some super-regions and then incre- 
mentally become a subregion of additional regions, 
resulting in “mix-and-match” levels of abstraction. 

4.1 Operator Abstraction 

information add ad to form a naw lava! of abatraetion 


wallC 


3 

pluir 

^ Action Typss: prspjnssrt 
< Constraint: 

^ d*oompoaa(instaN,[prap->lnsaft]) 
bee 


plumbeMittentlone 


Action Typsa: total 



configurations 


Figure 2: Operator Abstraction 
One of the constraint forms available in GEM- 
PLAN and COLLAGE is the decompose con- 
straint, which requires that actions of a specified 
type be [conditionally] decomposed into one of a 
set of possible patterns of interrelated subactions. 
Operator abstraction can be modeled in a localized 
framework by incrementally adding such action de- 

4 All domain information accessed by the plan-construction 
operations is represented and accessed in plan-relative fash- 
ion. As a result, new constraints, actions, regions, and def- 
initions can be “added” to a plan and thus become newly 
accessible to the reasoning mechanism. 
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composition constraints and, optionally, incremen- 
tally adding regions which contain the subactions at 
the “next level down.” Different degrees of interac- 
tion between “levels” may be achieved, depending 
on the localization configuration used. 

Figure 2 provides three sample configurations 
for modeling operator abstraction in the construc- 
tion domain of Figure l. 5 In all three cases, an 
install action of the plumber is decomposed into 
two subactions at a lower level of detail, prep 
and insert . In configurationl, no wall sub- 

region exists. Instead, operator abstraction is 
achieved by simply adding a decompose constraint 
to plumber . In configuration, the subaction 
types prep and insert and a subregion wallC con- 
taining them are also added, thus creating a new 
level that includes new action types and a new 
subregion. In configurations, region wallC con- 
tains the decomposition constraint and subactions, 
but overlaps with plumber only at the point of 
the higher level action install . Note how, in 
conf iguration2, the lower-level actions in wallC 
become subject to plumber's constraints, introduc- 
ing potential interaction between “abstraction lev- 
els.” In configurations, this interaction does not 
exist except at region plumber-intentions. If only 
region plumber adds install actions, no planning 
interaction will occur once region wallC is added 
(i.e., there will be no need to recheck the constraints 
in plumber ), thereby guaranteeing monotonicity in 
the reasoning process. For more discussion of mono- 
tonicity and related properties, see Section 5. Also 
note how, in general, constraints may refer to actions 
at mixed levels of detail. Unlike many hierarchical 
planners, GEMPLAN and COLLAGE allow both 
actions and their subactions to be present within a 
plan simultaneously. 

4.2 State Abstraction 

A localized framework can also model state ab- 
straction in several ways, depending on the desired 
effect. In Figure 3, three possible configurations are 
given in which various preconditions and definitions 
affecting the install action and its subactions are 
incrementally added. Configurationl illustrates 
how action preconditions (or top-level goals) can be 
added on a per-action basis, by simply incremen- 
tally adding precondition (or goal) constraints. If we 
wished a specific predicate to be completely unavail- 
able until a certain “abstraction level” (achieving a 
“partitioned hierarchy” [4]), is predicate definition 
and all precondition or goal constraints that utilize 
that predicate would not be added until that “level” 

5 The constraint syntax used in Figures 2 and 3 has been 
simplified for illustrative purposes. 
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Figure 3: State Abstraction 


in the reasoning process is reached. Another option 
is to incrementally add available “lower level” action 
types that have been defined to be adders or deleters 
of a predicate. In this way, action effects (rather 
than jusr preconditions) may be incrementally added 
to the i: it- of reasoning. In conf iguration2, a lit- 

eral cU Lte cannot be “deleted” until the lower- 
level act , n type prep is added to the domain. This 
next level also includes a new precondition for visual- 
ok (unused-site) , as well as a new definition for 
unused-site based on the lower level actions prep and 
insert. 

One can achieve a strict, noninteracting par- 
tition of predicates and actions into levels (i.e. 
monotonicity), by utilizing the strategy depicted 
in configurations. Here, the new region wallC 
is add ed which contains a new precondition con- 
straint, actions, and definitions at the next level 
down. In this case, region wallC overlaps with re- 
gion plumber rather than being strictly contained 
within it. Thus, if we adhere to a regimen in which 
only region plumber adds actions of type install 
(and only region wallC can add actions of type prep 
and insert), a strict separation of effect would be 
achieved - changes within region wallC would non 
trigger search within plumber , thereby guaranteeing 
monotonicity. 
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5 Discussion 

The primary point of this paper has been to show 
that localization is more general than abstraction - 
it can capture the same kind of heuristic informa- 
tion, but can also express other forms of encapsula- 
tion, with potentially greater benefits. This section 
discusses the impact of localization on such prop- 
erties as monotonicity and tries to shed some light 
on other plusses and minuses of localization and ab- 
straction. 

In [4], several properties are described that pro- 
vide a useful basis for the formation of abstrac- 
tion hierarchies. These include the upward solu- 
tion property , monotonicity, and ordered monotonic- 
tty. By constructing abstraction hierarchies in a 
way that ensures these properties, guarantees can 
be made about the completeness of an abstracted 
search space and the amount of backtracking that 
will be necessary. In particular, the upward solution 
property guarantees that decomposing the problem 
into abstraction levels will not remove completeness 
from the search space. Monotonicity properties ad- 
ditionally remove the need to backtrack into higher 
levels of the reasoning space. 

If localization is used to represent abstraction, 
what effect does this have on these properties? Does 
strictly controlling the ordering of constraint appli- 
cation remove possible solutions? Once actions, con- 
straints, and regions are added into the domain spec- 
ification, will backtracking to a point in the reason- 
ing space before this addition be required? These 
are precisely the kinds of questions that localization 
addresses. A localization structure captures the de- 
fined semantics of interaction between actions and 
constraints. If two constraints do not apply to the 
same pieces of the growing plan, they do not inter- 
act and their relative constraint ordering does not 
make a difference. Likewise, if actions, constraints, 
or regions incrementally added to the planning prob- 
lem do not cause triggering of previously defined 
constraints, a pure refinement strategy is possible 
- no backtracking will be necessary. And even if 
the regional configuration of a localization does not 
by itself guarantee independence, search heuristics 
(that encode knowledge about such guarantees) can 
be used to block unnecessary backtracking or con- 
straint rechecking. 

In sum, if localization is used to capture exactly 
and only the forms of abstraction available in the 
various systems outlined in [4], then localization 
will manifest the same properties as those systems. 
Guarantees about such things as monotonicity are 
a function of the abstractions or localizations cho- 
sen for a specific domain. The techniques used by 
Knoblock [3] to learn abstractions that guarantee or- 


dered monotonicity, or those used by Christensen [2], 
could also be used within a localization framework. 

But an advantage of using a localized framework 
is that it can be used to capture much more than ab- 
straction. Depending on the domain, physically- or 
process-based localizations might reap even greater 
search benefits than abstraction-based localizations. 
Even though “levels” of reasoning can be modelled, 
they form only a small portion of the structuring 
capabilities of localization. While properties such 
as ordered monotonicity may be useful, they come 
at a price. Since monotonicity requires noninterac- 
tion between levels, it may result in a collapse of 
the hierarchy. Indeed, this might be fairly common, 
since real-world problems rarely lend themselves to 
pure refinement strategies. In a localized framework, 
there is no need to collapse levels or regions into each 
other if they are not strictly independent. A local- 
ization need not be organized hierarchically and does 
not necessarily have to engender separate planning 
“phases.” Interactions are handled as a basic mech- 
anism of the search process which directs the flow of 
reasoning, without necessarily invoking backtracking 
into a “previous level” of reasoning. Finally, in a lo- 
calized framework, actions, definitions, constraints, 
and regions may be incrementally added in flexible 
ways. “Levels of detail” may be mixed among con- 
straints. The addition of subregion relationships can 
incrementally and selectively increase the scope of 
constraints. 

Of course, just as for abstraction, the trick is to 
find a good localization that reaps as many search 
benefits as possible. As discussed earlier, a research 
focus in COLLAGE is automatically learning such 
localizations. The key is to find a decomposition 
that balances decomposition and interaction. In- 
creased decomposition results in finer-tuned local- 
ization of constraints, but also results in increased 
regional overlap and accompanying increases in con- 
sistency maintenance costs and potential “thrash- 
ing” between regional search spaces. The tradeoff 
between locality and overlap mirrors the abstraction 
tradeoff between increasing the number of abstrac- 
tion levels and increasing the amount of interaction 
between levels. 

Admittedly, the cost of dealing with regional over- 
lap and the complexity of localized search is a limita- 
tion of the localization technique. Because abstrac- 
tion simply partitions the search into ever-growing 
levels of detail, it can still use global search meth- 
ods. The management of regional search in a local- 
ized framework requires more work. Other problem 
reformulation techniques may also be more feasible 
in an abstraction- based framework, where the prob- 
lem may be “reformulated” before search proceeds at 
each level. However, this might also be accomplished 
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in a localized framework via incremental modifica- 
tion of domain constraints, actions, and definitions. 

Finally, localization is also applicable to other 
kinds of tasks. Localization can be used to encap- 
sulate any kind of domain information - not just 
STRIPS preconditions, goals, and action decompo- 
sition. The technique can be used by any kind of 
reasoning that can be cast in terms of constraints ap- 
plied to a partitionable frame of reasoning. Methods 
based on localized search have already been incorpo- 
rated into a scheduler [13], another planner that uses 
abduction as the primary plan-construction mecha- 
nism [10], and an image understanding framework 
[14]. Localization can also aid replanning and plan 
reuse. If certain pieces of a plan become faulty 
during run-time, localization provides a good first- 
cut at which pieces of the plan can be reused and 
which constraints must be rechecked. Localization- 
based replanning and reuse is another focus of COL- 
LAGE. 
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Abstract 

The notion of irrelevance underlies many different 
works in AI, such as detecting redundant facts, 
creating abstraction hierarchies and reformulation 
and modeling physical devices. However, in order 
to design problem solvers that exploit the notion 
of irrelevance, either by automatically detecting 
irrelevance or by being given knowledge about ir- 
relevance, a formal treatment of the notion is re- 
quired. 

In this paper we present a general framework for 
analyzing irrelevance. We discuss several prop- 
erties of irrelevance and show how they vary in 
a space of definitions outlined by the framework. 
We show how irrelevance claims can be used to 
justify the creation of abstractions thereby sug- 
gesting a new view on the work on abstraction. 

Introduction 

Meta-level reasoning has received a lot of attention 
from researchers in artificial intelligence as a means 
of guiding problem solvers in their search for solu- 
tions [Hayes, 1973; Genesereth, 1988; Smith and Gene- 
sereth, 1985; Clancey, 1983]. A common of meta- 
level strategy is to avoid using knowledge that is ir- 
relevant to the goal at hand. In fact, the notion of 
irrelevance has been a common theme in many re- 
search works, but its formal analysis has received at- 
tention only from few researchers such as Subramanian 
and Genesereth [Subramanian and Genesereth, 1987; 
Subramanian, 1989]. The ability to give a problem 
solver advice about what parts of a knowledge base are 
irrelevant to a specific problem solving goal is a power- 
ful method to reduce its search. For example, consider 
a domain in which we are trying to find routes between 
cities in the country, using flights, trains and busses. 
For some goals, we might want to advise the problem 
solver that rules and facts about flights are irrelevant, 
either because the minimal price of flights is known to 
be greater than is required for the specific goal or be- 
cause we know that flights will not yield an optimal 
solution. By giving this advice, we significantly reduce 


the size of the search space explored by the problem 
solver. 

The notion of irrelevance also plays a key role in 
work on abstractions and change of representation. In- 
tuitively, when we want to create a simpler or abstract 
representation we remove some irrelevant detail. If the 
removed detail was indeed irrelevant then the solution 
to the problem in the abstract theory will map back to 
a solution in the original theory. Therefore, if we can 
provide the system with knowledge about irrelevance 
or relative irrelevance of knowledge, the system can 
exploit it to automatically create abstractions. Meth- 
ods for mechanically detecting relevance can be used 
to automatically create abstractions. 

However, both in order for a user to be able to 
state such claims to a system in a principled man- 
ner and for the system to make proper use of given 
claims, a better analysis of the notion of irrelevance 
in problem solving is required. This paper describes 
a general framework for analyzing the notion of irrel- 
evance. We define a space of possible definitions of 
irrelevance by identifying several axes along which ir- 
relevance claims differ. Several important properties of 
irrelevance concerning their usage in problem-solving 
are outlined and we show how varying the definition 
of irrelevance in our space affects the satisfaction of 
these properties. Next, we discuss how irrelevance 
claims can serve as justifications for creating an ab- 
straction. The case of irrelevance of a distinction be- 
tween properties (represented as predicates) is exam- 
ined in detail and we show how such a claim serves as a 
justification for predicate abstraction [Plaisted, 1981; 
Tenenberg, 1987J. 

This framework makes several contributions. First, 
it clarifies the issues involved in the notion of irrele- 
vance therefore enabling us to better exploit the notion 
in works that rely on it, such as the work on detect- 
ing redundant facts or creating abstraction hierarchies. 
The properties of irrelevance that we outline provide 
guidance in building a system that incorporates such 
claims. Giving precise definitions of irrelevance for- 
malizes the problem of automatically deducing irrele- 
vance facts, thereby enabling us to automatically cre- 
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ate abstractions, based on deduced irrelevance claims. 
Moreover, since our framework provides a language to 
express knowledge about irrelevance, we can use this 
language to express knowledge about the domain that 
can help reduce the size of the search or justify creating 
an abstraction. 

Preliminaries 

Assume our theory of the domain is represented by a 
knowledge base of first order predicate calculus formu- 
las, A. A problem solving goal (or query) is repre- 
sented by a formula xp. The goal is to find whether xp 
is implied by A (or if xp has free variables, we want 
to know which assignments to the variables result in 
a formula that is entailed from A.). Our aim is to 
identify facts that are irrelevant to xp in order to re- 
duce the search space generated for ip. Formalizing 
the concept of irrelevance can be done in several levels. 
For example, one can formalize irrelevance in terms of 
the models of A and xp , i.e., a semantic level analy- 
sis. Irrelevance can also be analyzed in terms of the 
facts in the theory A, a so called meta-theoretic analy- 
sis [Subramanian, 1989]. Alternatively, one can give a 
proof-theoretic analysis of irrelevance, in terms of the 
actual set of derivations the problem solver can explore 
in the search to solve xp . Although these levels are by 
no means independent, it is important to distinguish 
between them when defining irrelevance or comparing 
between definitions. 

The goal of this paper is to define notions of irrele- 
vance that enable us to optimize actual problem solv- 
ing. Therefore, we analyze irrelevance from the sys- 
tem’s view of the problem-solving process which is a 
proof-theoretic one. The system does not actually see 
the world as the user sees it nor does it see the con- 
ceptualization of the world. Instead, it sees the set of 
symbols used to describe the domain and the set of 
derivations it can generate. 

Example 1: Suppose we are using a resolution theo- 
rem prover on a knowledge base in clause form 1 . Con- 
sider the following two theories: 

T\ = {/ => g, ->/ => s} 

T2 = {?}• 

Ti and T 2 are satisfied by the same set of models. In 
each the value assigned to / does not affect the value 
of g, and therefore we might consider / to be irrelevant 
to g. However, in Ti, the theorem prover will have to 
resolve on the symbol / to derive g , and therefore as 
far as it is concerned, it can’t ignore the symbol /. I 

Note that we are not claiming that irrelevance rela- 
tions in the domain are not useful to control problem 
solving; quite the contrary. Most irrelevance facts are 

1 For clarity, in this document we do not use clause form 
notation but assume the problem solver gets formulas in 
clause form. 


based on properties of the domain. However, a rele- 
vance relation in the domain will only be useful if it is 
reflected in the representation. 

In particular, for a problem solver to exploit irrel- 
evance claims, the following properties of irrelevance 
claims will be of interest. Assume IR(<p , xp , A) denotes 
that the fact (or set of facts) <p is irrelevant to the goal 
xp with respect to the theory A. 

• What can the problem solver do given the irrele- 
vance claim? Can it ignore a fact that is deemed 
irrelevant? Can it ignore any fact that contains it as 
a subexpression? 

• Do irrelevance claims add up? If IR(<p\,xp, A) 
and IR(<p 2 ,xp } A) hold, does that imply that 
IR({<Pu <Pi)i xp, A) holds? If so, we can use all the 
relevance claims that are available to us at a given 
instant. However, if not, we can only use one at a 
time, and then we must check that the others still 
hold in the resulting theory. 

• Is irrelevance a monotonic property? I.e., if we add 
more facts to the knowledge base, can irrelevant facts 
become relevant or vice versa? 

• Does the irrelevance of a subject imply the irrele- 
vance of a subject which is syntactically related to 
it? E.g., Does IR(<p,xp } A) imply /#(-■<£, ^, A) or 
I R(<p V <p \ , xp, A)? Such properties will enable us use 
a given set of irrelevance claims to deduce additional 
ones. 

• Can irrelevance claims be found automatically by 
examining the KB? 

An important issue in a definition of irrelevance is 
the subject of irrelevance, i.e., the type of entity being 
deemed irrelevant to the goal. So far we discussed only 
the irrelevance of a fact (or set of facts) to a problem 
solving goal, but the subject may be any kind of entity 
in the representation, such as the objects-constants, 
predicate-symbols and functions. The irrelevance sub- 
ject can also be more abstract such as a decision to 
distinguish between a set of predicates or objects in 
the representation. The following is an example of the 
irrelevance of a predicate distinction. 

Example 2: Consider the knowledge base with the 
following facts. 

ri : SportsCar(x) ^ Vehicle(x) 

r 2 : FamilyCar(x) => Vehicle(x) 

r 3 : SportsCar(x) HighRisklnsurance(x) 

r 4 : FamilyCar(x) => ^SportsCar(x) 

r 5 : F amilyC ar(C amry) 

In order to solve the query Vehicle(x) 1 the distinc- 
tion between the relations Sport sCar and F amilyC ar 
is irrelevant. Intuitively, all that matters for the proof 
is that x is some kind of car. Therefore, we can re- 
move the distinction in the representation by predicate 
abstraction [Tenenberg, 1987J. We express the theory 
using an abstract predicate, Car, as follows: 
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si : Car(x) => Vehicle(x) 

S2 : Car(Camry) 

ri and were abstracted to s\, while r$ was ab- 
stracted to S2- r 3 on the other hand was a rule specific 
to SportsCar , because 
FamilyCar(x) => HighRisklnsurance(x) 
does not hold, and therefore we cannot abstract it to 
Car(x) => HighRisklnsurance(x). 

Consequently, it is removed from the theory. Simi- 
larly, r 4 is a formula that distinguishes between the 
relations FamilyCar and SportsCar and therefore is 
removed from a theory that ignores the distinction be- 
tween these relations. | 

A final issue that factors into a definition of irrele- 
vance is the space of possible changes of the representa- 
tions and the theory (or weakenings of the theory [Sub- 
ramanian and Genesereth, 1987]) we are considering in 
order to remove the irrelevancy. In example 1, we only 
considered changing the theory by removing facts and 
therefore we could not justifiably say that / is irrele- 
vant to < 7 . However, had we considered changing the 
theory by adding some of its logical consequences (e.g., 
g), we could deem / irrelevant to g. In example 2, the 
irrelevancy was removed by predicate abstraction, i.e., 
replacing the predicates FamilyCar and SportsCar 
by an abstract predicate Car. 

A Space of Irrejevancies 

To capture the various properties of irrelevance we de- 
fine a space of possible definitions of irrelevance. The 
space of definitions revolves around the set of possible 
derivations of the goal. Let A be a knowledge-base, ip 
be a goal and D be the set of derivations of xp from 
A. A definition of irrelevance of <p (which can be any 
irrelevance subject) to ip is composed of the following 
choices: 

Al. Defining irrelevance of <p with respect to a single 

derivation, D E D. 

A 2 . A subset Do of V over which to quantify Al. 

A3. The method of quantification over Do, i*e., ex- 
istentially or universally. 

Formally, if D is a derivation of a goal ip from a 
knowledge base, A, we denote the choice for Al by 
Ir(<p,ip,D ), i.e., that <p is irrelevant to the derivation 
D of the goal ip. If $ is a set of facts, 7 r($, ip, D ) holds 
if Ir(<pi,ip , D) for all fa E 

Definition 3 : Let Vo be a set of derivations of a 
goal ip from the knowledge base A 2 . <p is said to 
be weakly irrelevant to ip with respect to Do, de- 
noted by WI(<p,ip,Vo) 3 , if Ir(<p, ip, D) holds for some 

2 If ip is a set of goals (e.g., a goal with free variables) we 
consider a set of derivations for every element of ip. The 
definitions below hold if they hold for every element of ip. 

3 Note that the knowledge base A is implicit in the third 

argument of WI and SI. 


D E Do- <p is said to be strongly irrelevant , denoted by 
SI(<p, Ip, Vo), if Ir(<p, ip , D) holds for every D E D 0 . I 

Note that in Definition 3, the knowledge base, A 
does not appear explicitly in WI (SI), but is implicit 
in the set Do- For every choice of Ir and of Do, we get 
a definition for weak and strong irrelevance. Except 
for Do = D, examples of Do include the set of all min- 
imal derivations 4 , or all derivations bounded by some 
resource constraints. The following example clarifies 
some of these distinctions. 

Example 4: Consider a knowledge base with the fol- 
lowing rules: 

r\ ' £(*) => Q{x) 
r 2 : R(x) => Q(x) 

r 3 : P(x) => Q(x) 
r 4 : E(x) => P(x) 
r 5 : Q(x) =* P(x) 


0<x) 



Figure 1: Search space for a goal Q(x) 

The knowledge base also contains a set of ground 
facts but only for the predicate E . Figure 1 shows the 
possible derivations that can be generated for Q from 
this theory. Suppose we define 7r(r, g, D) to hold if 
the rule r does not appear in the derivation D. Let D 
be the set of all derivations of Q(a) 5 . WI(rz, Q(a),V) 
holds since whenever Q(a) is derivable, there will be 
a derivation of (J(a) using only n. SI(r 2 ,Q(a),V) 
holds because r 2 cannot be part of a proof of Q(a). 
S7(r 5 , Q(a),D) does not hold, however, if we con- 
sider the set of non-redundant derivations Do 6 , then 
SI(rs,Q(a), D 0 ) holds. I 

Irrelevance of a Fact 

In this section we briefly consider the case in which the 
relevance subject is a single fact, and show how vary- 
ing the choices for A1-A3 affects the properties of the 
resulting irrelevance claims. The definitions consider a 
specific problem solver, hence our discussion assumes 
we are using resolution theorem prover. A derivation 

4 Given some criteria of minimality of deductions. 

5 which will be empty if i?(a) is not in the knowledge 
base. 

*A derivation tree is redundant if it has two identical 
nodes rii and n2 such that nj is an ancestor of n 2. 
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is a resolution tree of clauses, where the goal clause 
(or the empty clause in case of a refutation proof) is 
the root, and the children of every clause are the two 
clauses that were resolved in order to get it. The leaves 
of the tree are clauses from the knowledge base, and 
they are denoted by Base(D). 

Consider three choices for Al. In the first, a fact 
is irrelevant to a derivation of the goal if it is not one 
knowledge-base facts used in it: 

Definition 5: Iri(<p, ip, D) if <p £ Base(D). | 

A stronger definition requires that (p is irrelevant to 
a derivation if it appears nowhere in the derivation: 

Definition 6: Ir 2 (<p,ip,D) iff there does not exist a 
substitution a such that (per is a subclause of a clause 
in D. I 

Subramanian [Subramanian, 1989] defines <j> to be 
irrelevant to ip with aspect to a theory A, if there is 
a subset of A that tils ip but is non-committal on 
<p. In our space, we u formalize this as follows: 

Definition?: /r 2 <p,\p,D) if Base(D) <p and 

Base(D) -><p. I 

Using 7>3, for a refutation resolution theorem 
prover, WI(<p, ip, V) is equivalent to the definition 
given in [Subramanian, 1989]. 

Figure summarizes the different properties that hold 
for the definitions described above. The following show 
how the properties of weak irrelevance differ from those 
of strong irrelevance. 

Observation 8: Whenever irrelevance adds up on a 
single derivation, it will add up for strong irrelevance, 
i.e., if 

/r($ 1 yip, D) A /r($ 2 , ip, D) ^ 7r({$i , $ 2 }, 9, D) 

hold for any D, then for any choice of Vo, 

SI(*i , ip, V) A SJ(* 2 , 0, V) => S/({*i , * 2 }, 9, T>) 


This property does not hold for weak irrelevance. | 

Observation 9: The converse holds for weak irrele- 
vance too, i.e., whenever 

/r({$i,$ 2 }, => Ir($i,ipy D) Alr($2,ip, D) 

holds for any D, then for any choice of Vo, 

WI({^ u ^ 2 },iP,V 0) =► WI(* u iP,V 0 ) AW/(*2,iM>o) 

SI({$ 1 , $2}, ip, Vo) ^ S/($i , ip, Vo) A S/($ 2 , ip, V$) 

I 

Observation 10: For any definition of Ir such that 
Ir(<p,ip,D) => Ir\(<p,ip,D)y if we add facts to the 
knowledge base, irrelevance can change as follows. A 
fact that was weakly irrelevant will still be weakly ir- 
relevant. A fact that was strongly irrelevant will be 
at least weakly irrelevant. A fact that was not weakly 
irrelevant might become weakly irrelevant. | 
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Pi: WI(<p,ip,V) implies that A \ <p ^ *!>• 

P 2 : WI(<p, ipyV) implies that the problem solver 
can ignore any derivation that contains ip. 

P3: WI(<p,ip,V) implies that the problem solver 
can ignore any derivation that contains 
ip as a. subexpression. 

P 4 : Adding up - 

Ir($ u rp,D) A/r(* 3 i,D) => 

/V Transfers through equivalence - 
Ir((p u ip, D) A (<pi = <p 2 ) => 

Ir{<p 2 ,ip, D). 

Pq : If 0 is a subclause of <p\, then 
lr(<p u ip,D) => Ir(<p\,ip, D). 

Figure 2: Properties of Irrelevance 


Deducing Irrelevance Claims 

Varying the definition of irrelevance has drastic ef- 
fects on the ability to automatically derive irrelevance 
claims. Given a knowledge base A and a goal ip t we 
would like to derive all (or part of) the facts in A that 
are irrelevant to ip. In general, looking at the whole 
knowledge base to determine irrelevance will be more 
costly than solving the query. A more interesting ques- 
tion is whether irrelevance claims can be derived by 
looking at only a small and stable part of the knowl- 
edge base. For example, in example 4, we were able to 
determine irrelevance by merely looking at the struc- 
ture of the proof space created by the rules, regardless 
of the specific ground facts for the predicate E. 

We examine this question for knowledge bases com- 
prised of a set of Horn rules with no function symbols 
(Datalog, [Ullman, 1989]), and a database of ground 
atomic facts. We distinguish between two sets of pred- 
icates in the knowledge base, the extensional predicates 
(EDB predicates) which are those that appear only 
in the database and in antecedents of rules, and the 
intensional predicates (IDB predicates) which are the 
predicates appearing in the consequents of the rules, 
i.e., the predicates that are being defined by the EDB 
predicates and the rules. A query is an IDB predicate, 
i.e., to find all the derivable facts for that predicate. 
Every derivable instance of the goal has a (perhaps 
more than one) derivation tree. A derivation tree is 
a tree consisting of goal-nodes and rule-nodes. A goal 



node is labeled by a ground atom, and it has a single 
child, which is an instantiated rule-node. The head 
of an instantiated rule-node is identical to its parent 
goal-node. A rule-node has a child goal-node for each 
one of its subgoals. The leaves of a derivation tree are 
goal-nodes labeled by ground atoms from the EDB. A 
derivation is not minimal (or redundant) if there are 
two identical goal-nodes n\ and 712 , such ni is an an- 
cestor of ri 2 . A rule r is irrelevant to a derivation D 
(i.e., Ir(r,ip,D)) if none of the rule nodes in D are 
instances of r (note that this is equivalent to Ir\ and 

/r 2 ). 

The question we address is the following. Given a set 
of rules, V y a query q and a definition of irrelevance, 
can we determine whether a rule r 6 V is irrelevant 
to query for any possible set of ground facts in the 
knowledge base. We consider two choices for A2, the 
set of all derivations of the goal q y denoted by 2>, and 
the set of all minimal derivations, Vq. 

Finding irrelevant rules enables us to significantly 
prune the search space for the query. In exam- 
ple 4, rule 7*2 will not appear in any derivation of 
Q, therefore SI(r 2 , Q{x) y V) holds. r 5 will appear 
only in redundant derivations of Q and therefore 
SI{r$ y Q(x),Vo) holds. Since Q(x) can always be de- 
rived using either r x or {r 3 ,r 4 }, both WI(r x , Q(x),V) 
and kF/({r 3 ,r 4 },(3(x),2?) hold. Consequently, iden- 
tifying the various kinds of irrelevance can enable us 
to compute Q using only r x . Considering constraint 
literals in the rules enables us to derive additional ir- 
relevance claims: 


Example 11: Consider the following knowledge base: 

si : Q(x, z) A Qi(z y y) A x < z => P(x , y) 
s 2 : Q(z y x) A Qi(x, y)Ax<y=> P(x, y) 

^3 * 7Ti(x, y) A x < 3 ^ Q(x y y) 
s 4 : E 2 {x,y) Ax > 1 =>Q x (x y y) 

If the query is P(x, y), all rules are relevant. However, 
if the query is P(x, y) A (y < 1), then s 2 is strongly 
irrelevant, i.e., SI(s 2i P(x,y) A (y < 1),X>). I 


Finding all rules which are weakly irrelevant, i.e., 
WI(r, g,V) } is precisely the rule redundancy problem 
shown to be undecidable by Shmueli [Shmueli, 1987]. 
Consequently, determining WI(r, y,X> 0 ) is also unde- 
cidable. For strong irrelevance, if the rules contain 
no constraint literals and no object constants, deter- 
mining 5/(r, y,Z>o) is equivalent to the rule reachabil- 
ity problem that has an easy polynomial time solu- 
tion [Kifer, 1988]. [Levy and Sagiv, 1992] gives an al- 
gorithm for detecting 5/(r, y,2> 0 ) and 5/(r, y,Z>) even 
when constraint literals are present. It also establishes 
an exponential-time lower bound on the problem of 
determining SI(r,g,T>o). 


Using Irrelevance to Justify 
Abstractions 

Much of the work in AI on creating abstraction hierar- 
chies relies on the intuition that by creating an abstract 
theory we are removing some irrelevant detail. If the 
detail removed is indeed irrelevant, then a solution to 
the problem in the abstract theory will map back to 
a solution in the original theory (also referred to as 
the ground theory). Otherwise, we will have to back- 
track between abstraction levels. Although this has 
been the motivation underlying work on abstractions, 
the formal connection between irrelevance and abstrac- 
tions has received little attention (e.g., [Subramanian, 
1989]). For example we can view predicate abstraction 
as being justified by the irrelevance of a distinction be- 
tween predicates; object aggregation can be justified 
by irrelevance of a granularity distinction. Identifying 
abstraction with the notion of irrelevance offers several 
advantages: 

• We make explicit what is being abstracted (i.e., the 
subject of irrelevance). 

• We make clear the strength of the justification for 
the abstraction (by the strength of the type of irrel- 
evance claim that holds). 

• We formalize the problem of automatically creating 
abstractions by translating it to the problem of au- 
tomatically finding irrelevance claims. 

In this section we briefly discuss how irrelevance 
claims that are justifications for abstractions can be 
formulated in our framework. We identify several ir- 
relevance subjects that account for many abstractions 
discussed in the literature. As a consequence we get 
an expressive language to state knowledge about the 
domain that can affect the creation of abstractions. 
We define a notion of irrelevance that best justifies ab- 
stractions and mention several weaker notions. 

The first assumption underlying a formalization of 
irrelevance is that removing irrelevant detail should 
not enable us to reach new conclusions about the set of 
goals we are interested in, i.e., any conclusion reached 
in the abstract theory should be an abstraction of one 
in the base theory (this is also known as a TD property 
of abstractions [Giunchiglia and Walsh, 1991] or the 
downward solution property [Tenenberg, 1987]). The 
justification for this claim is that by removing irrel- 
evant detail, we are effectively ignoring some of our 
knowledge, and therefore, we can not come to new 
conclusions 7 . For example, when we remove some ir- 
relevant detail in a planning problem (e g., action pre- 
condition), if the resulting abstract plan can not be 
mapped back to a base-level plan, the detail we have 
removed was not truly irrelevant to the problem 8 . Sec- 

7 As long as the our reasoning has no form of non- 
monotonicity. 

8 Note that this does not necessarily mean that the ab- 
straction is not useful! 
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ond, the abstract theory should not prevent us from 
solving the goal, i.e., if the original theory had a so- 
lution to the goal, then the abstract one should too. 
Finally, in order for the abstraction to be computa- 
tionally effective, the solutions that are preserved by 
the abstraction should be the cheaper ones. 

These criteria are naturally formulated in our frame- 
work. Recall that in order to define irrelevance of a 
subject a, we must give a definition for Ir(a,xp,D) t 
i.e., when the subject a is irrelevant to a derivation D. 
Given a theory A, we denote the abstract theory re- 
sulting from removing the irrelevancy a by / 0 (A). For 
example, if a is a distinction between predicates, f Q ( A) 
is the theory resulting from predicate abstraction. The 
exact form of / a (A) is discussed in the next section. 
We base our definition of Ir on a mapping h a from the 
derivations of xp in A, denoted by 2>i , to the derivations 
of /a(^) in /a(A), T>i. The only requirement from h a 
is that it is onto 2V h a need not be a total mapping on 
2>i, i.e., there might be derivations of xp that will not 
be mapped to the abstract theory, and it need not be 
1-1. Other constraints on h a will yield stronger forms 
of irrelevance and therefore stronger justifications for 
the abstraction, (for example, h a will be called a sim- 
plifying mapping if for any D € Pi, the cost of h(D) is 
no more than the cost of D 9 ). Given h a , Ir(a,xp,D) 
is defined as follows: 

Definition 12: Ir(a,xp,D) is true iff h Q (D) is not 
empty. I 

Note that in this definition h Q is dependent on a 
and xp. Definitions of weak and strong irrelevance are 
obtained by quantifying the definition of Ir over a cho- 
sen set of derivations. The following states that the 
first two requirements of an abstraction are satisfied 
by weak irrelevance. 

Observation 13: If Vq is a set of derivations in V\, 
and W7(o, i/>,Z>o) holds then xp is provable from A if 
and only if f(xp) is provable from / a (A). I 

In order satisfy the third requirement, we must im- 
pose a restriction on X>o- 

Observation 14: If Vo is a set of derivations that 
contains all minimal derivations and h a is a simplifying 
mapping, then if SI(a , xp,V o) holds, f Q (\p) will have a 
solution in the abstract theory if and only if it has one 
in the original theory, and at least one of abstract-level 
solutions will cost no more than that cheapest solution 
in the original theory. I 

This condition is a sound justification for creating 
the abstraction. Imposing more constraints on h Q will 
give us even stronger justifications. For example, we 
can require that h a (D) effectively break up D into 
subproblems of equal size. Knoblock [Knoblock, 1990; 
Knoblock ti a/., 1991] shows how this constraint along 

9 Given some cost model for derivations <jch as the 
number of nodes in the proof tree. 


with other *ffect« he ability to achieve savings when 
employing ^rarc* \\ planning. 

Weaker releva; o claims can also be given to the 
system. For exan-c-ie, we can state a distinction a \ 
is more relevant than a distinction c* 2 , i.e., whenever 
a\ is justifiably abstracted, so is a^. Another kind 
of claim is one a probabilistic one, i.e., stating to the 
system that in most cases a is irrelevant to xp. The sys- 
tem can then use this claim and succeed in most cases 
and backtrack in others. By stating irrelevance claims 
declaratively we can also state under what conditions 
the relevance claim holds. 

In the next section we examine the case of predicate 
abstractions and show they are justified by irrelevance 
of a predicate distinction. 

Irrelevance of Predicate Distinctions 

When designing a representation, a decision has to be 
iade about the detail with which to conceptualize the 
world. In some cases, identifying a property P (e.g., 
Car(x)) will suffice. In other cases we need to refine 
P to subclasses V = {Pi, . . ., P n ) (e.g., SportsCar(x), 
FamilyCar(x), etc.) For some goals, the finer distinc- 
tion of properties is irrelevant, and therefore, reason- 
ing will be more efficient if we change the theory by 
abstracting the distinction. We would like to be able 
to give the system knowledge about the domain that 
will guide it in deciding when a predicate distinction 
is relevant. To define the meaning of such an irrele- 
vance claim in the framework, we first must define the 
abstract theory resulting from removing the predicate 
distinction and the mapping of derivations between the 
original and abstract theories. 

The Abstract Theory 

Suppose we have a theory A, consisting of a set of 
predicates V - ■ Pi, . . . , P n } , and we want to abstract 
the distinctic tween them by replacing them by a 

predicate P * ■ represents their union (e.g., we want 

to replace {Fa nilyCar, Sport sCar) by the predicate 
Car). Intuitively, to abstract the theory A, we re- 
place every occurrence of a predicate in V in every 
formula in A by P (e.g., abstract FamilyCar(x) => 
Vehicle(x) by Car(x) Vehicle(x)). However, doing 
so for every formula in A might result in an inconsis- 
tent theory or in a theory that will entail conclusions 
that were not entailed by the original one. In exam- 
ple 2, abstracting rule r 4 will result in a contradiction 
(Car(x) => -■C'ar(x)), and abstracting r 3 will result in 
a fact that is not entailed by the theory (i.e., Car(x) ^ 
HighRisklnsurance(x) does not follow from the the- 
ory). In order to assure that our derivation mapping 
will be onto, we need the abstract theory to be consis- 
tent with the uround one. Tenenberg [Tenenberg, 1987; 
Tenenberg, T :> • *] discusses predicate abstractions and 
defines the n v mal set of formulas that can be in- 
cluded in the - *tract theory such that the abstract 
theory will b' insistent with the original one. His 


126 


definition is based on the interpretation of the abstract 
predicate, which is the union of the interpretations of 
the predicates in V. However, as Tenenberg notes, this 
set is usually infinite even when the ground theory is 
finite. Therefore, the abstract theory we consider is 
a finite subset of the one defined by Tenenberg. Our 
abstract theory consists of the abstractions of the for- 
mulas in the base theory that are independent of the 
predicate distinction. Intuitively, a formula is inde- 
pendent if its abstraction is consistent with the theory. 
In the formal definition, we assume that formulas are 
represented as clauses. A literal in a clause is negative 
if it is a negation of an atomic formula (e.g., -> P(x ) 
is a negative literal, while P(x, y) is a positive literal). 
Neg(C) (Pos(C)) denotes the set of negative (positive) 
literals in a clause C. 

Definition 15 : Independence - Let V — 

P \ , . . . i P n > and suppose Neg(C)' is the result of substi- 
tuting every occurrence of an element of V in Neg(C) 
by some other predicate in V using a mapping fy . (Two 
occurrences of the same predicate need not have the 
same mapping under fy.) A clause C is independent 
of a predicate distinction V with respect to a ground 
theory A, if for any such f x there exists a mapping, 
/ 2 of the occurrences of elements of V in Pos(C) to 
elements of V , such that Pos(C )' = / 2 (Pos(C)) and 
A h Pos(Cy U Neg(C) no I 

Note, that a clause that contains only positive liter- 
als from V will be independent whenever it is provable 
from the theory. The problem arises with the negative 
literals. In example 2, all rules but r 3 are independent 
of the distinction {FamilyCar, Sport sCar}. 

Lemma 16: A clause C is independent of a predicate 
distinction V , if and only if f(C) would be included in 
the abstract theory as defined by Tenenberg in [Tenen- 
berg, 1990]. 

The Derivation Mapping 

Given the abstract theory produced by removing the 
predicate distinction, we can define the mapping of 
derivations in the base-theory to those in the abstract 
one. Recall that we require that the mapping be an 
onto mapping. Intuitively, given a derivation in the 
abstract theory, a base-level derivation that is mapped 
to it should be obtainable by reversing the abstraction 
function on the formulas in the derivation. However, 
as the following example shows, this cannot always be 
done. 

Example 17: Consider the following knowledge base: 
n : Py(x) => Q(x) 
r 2 : P 2 (x) =» P(x) 
r 3 : R(x) => Pi(x) 
r 4 : 

10 Notice, that in the definition we use h, which assumes 
a simple case where the base-level reasoner and the meta- 
level reasoner are the same. However, in general, they need 
not be the same. 


Suppose we want to abstract Pi, P 2 by an abstract 
predicate P. The resulting abstract theory will be: 

Sj : P( x) ^ Q(x) 
sj : R(x) ^ P{x) 

53 : P(a) 

s\ is included in the abstract theory because rt 
is independent of the predicate distinction (because 
P 2 (x) ^ Q(x) is derivable from the theory). 

The (single) derivation of Q(a) in the abstract theory 
cannot be trivially mapped to a base-level derivation. 
The reason is that it uses sy and s 3 , and they are ab- 
stractions of of rj and r* which do not yield a base 
level derivation of Q(a). | 

The source of the problem is that some reasoning was 
done in the process of creating the abstract theory. In 
this case, s\ already represented a base-level chain of 
reasoning that derived P 2 (x) ^ Q(x ). 

Informally, we define the derivation mapping, /i a , 
by specifying all the base-level derivations that map to 
a given abstract-level derivation D. The mapping is 
defined in two steps as follows. Given D, we first con- 
struct all the possible mappings in which occurrences 
of P in D are mapped to elements of P, such that 
the resulting derivation is a valid one. For example, in 
Figure 3, the abstract-level derivation (a) has two such 
possible mappings (b) and (c). Next try to complete 
each of the resulting derivations such that they will be 
valid derivation in the base-level theory. In our exam- 
ple, (b) cannot be completed because Pi (a) does not 
follow from our original theory, (c) however, can be 
completed, as shown in (d). Any such complete base- 
level derivation is mapped to D under the mapping 
h a . In Figure 3 only (d) is mapped to the abstract 
level derivation (a) (i.e., h a (d) = a). 

In order to show that h Q is onto, we must show that 
at least one of the intermediate derivations can be com- 
pleted to a valid derivation from the base-level theory. 

We prove this by defining one mapping Ad, from the 
occurrences of P in D to V. Ad will have the property 
that when we apply it to D , the resulting derivation 
is guaranteed to have a completion to a valid base- 
level derivation. Let C be the leaves of the abstract 
level derivation, D that contain the predicate P. We 
define Ad on the occurrences of P in C such that two 
literals that are resolved somewhere in D are assigned 
the same predicate in V. That ensures that Ad can be 
extended to all the occurrences of P in D. For clarity, 
we assume that P does not appear in the root of D, 
and that D did not have any non-trivial factoring (see 
[Genesereth and Nilsson, 1987]). We define a partial 
order < on the clauses in C, and make assignments to 
clauses in the topological order induced by <. 

Definition 18: For every C,, Cj 6 C, C, < Cj iff an 
ancestor of C% is resolved with an ancestor of Cj on 
a literal in P, and the ancestor of C$ contributes the 
positive literal to the resolution. I 

Lemma 19: The relation < is acyclic. 
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Figure 3: Mapping base-level derivations to abstract-level derivations 


Also note that if C, is minimal in the order. <, (i.e., 
there is no Cj such that Cj < Ci), then C» contains 
only positive appearances of P. 

We define M on the occurrences of P in C, only after 
we have defined the mapping for all its occurrences in 
clauses Cj such that Cj < Cj, as follows: 

• If Ci contains only positive appearances of P, we 
map the occurrences of P such that that the re- 
sulting clause is entailed from the base-level theory. 
Note that by the definition of the abstract theory, 
there must be at least one such mapping for C, . 

• If Ci contains negative literals of P, we do the follow- 
ing. For any negative occurrence of P, the positive 
literal with which it is resolved in D has been already 
mapped previously (by the definition of <). Hence 
we map it to the same element of V to which its 
counterpart was mapped. As for the positive liter- 
als, any assignment for them such that the resulting 
clause is derivable from the base theory is a valid 
assignment. The definition of the abstract theory 
(i.e., all elements of C are abstractions of indepen- 
dent base-level clauses), guarantees that at least one 
such assignment exists. 

The mapping M guarantees that every leaf of the 
tree is either in the knowledge base or is derivable from 


it. Therefore, the resulting tree can be completed to a 
full base-level derivation. 

Theorem 20: The derivation mapping h a is well de- 
fined and nto (i.e., every derivation tn the abstract 
theory he it least one derivation in the base theory 
that map > it), and is a simplifying abstraction. 

Properties of the Irrelevance Definition 

Given the definition of irrelevance, the question arises 
whether given the original theory and the abstract one, 
it is possible to decide if the predicate distinction is ir- 
relevant to the goal. The following provides a first step 
in that direction by identifying a class of derivations 
that are preserved by the abstraction. 

Theorem 21: IfV q is a set of derivations of the goal 
such that for any D € Vo, all the facts in Base(D) 
are independent of the predicate distinction V , then 
SI(V,tl>,V o) holds. 

Observation 22: The converse does not hold. I.e., 
can have a derivation in the abstract theory, but 
not have one in the base theory only from independent 
facts. Example 17 illustrates that. 11 I 

"Note that if we change the definition of independence to 
require Pos(C) 1 U Neg(C)* 6 A instead of A h Pos(CY U 
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From this condition and the algorithms described 
in [Levy and Sagiv, 1992] we can construct an algo- 
rithm for detecting irrelevance of predicate distinctions 
in the following case: 

Corollary 23: Given a Datalog theory, A and fp( A) 
which is the abstract theory resulting from removing 
the distinction between predicates in V, there is an al- 
gorithm to determine whether SI (V , if , Vo) holds for 
any given set of ground facts, where Vq is the set of all 
non-redundant derivations of if from A. 

Note that creating the abstract theory is in general 
undecidable because it entails solving the rule redun- 
dancy problem. Methods for detecting some classes 
of redundant rules (e.g., [Sagiv, 1988]) can be used to 
construct a subset of the theory. 

Other Relevance Subjects 

The same technique described above can be used to 
define irrelevance of other kinds of relevance subjects. 
[Levy, 1992] discusses the following subjects: 

• Object aggregations: We replace a set of object 
constants by an aggregate object. E.g., replace the 
subparts of a component by one object representing 
the component. For example, in the Missionaries 
and Cannibals problem [Amarel, 198l], we can re- 
place the sets of missionaries and cannibals by ob- 
jects denoting their sets. 

• Object distinction: We replace a set of object con- 
stants by a representative object that has only the 
properties common to all elements of the set (i.e., we 
replace a set O = {o\ , . . . , o n } by an object o, such 
that P(o) holds iff P(o,) holds for every o, € O . 
For example, when reasoning about chemical reac- 
tions, it is enough to consider only one representa- 
tive molecule of every type in the chemical formula 
and that suffices to describe the complete reaction 
between the substances. 

• Predicate representative: We replace a set of 
predicates V by an abstract predicate that repre- 
sents their intersection. 

• Macro rule: We replace a set of facts S by a logical 
consequence, s of S. 

Conclusions 

We presented a general formal framework for analyz- 
ing the notion of irrelevance. The framework contains 
a space of possible definitions of irrelevance claims that 
enabled to formalize previous definitions (e.g., [Subra- 
manian, 1989]) and present new ones. We identified 
several important properties of irrelevance claims and 
demonstrated how these properties change as we move 

Neg(C )' , we will get the converse direction too, i.e, if a 
goal has a proof in the abstract theory, it will have one in 
the ground theory in which all facts are independent of the 
predicate distinction. 


in the space of definitions. The framework enabled 
us to irrelevance claims that serve as justifications for 
abstractions, thereby providing a new view on work 
in abstractions. Justifying abstractions by irrelevance 
claims provides a first principles [Subramanian, 1989] 
account of abstractions, elucidating questions such as 
automatically creating abstractions, creating abstrac- 
tions that are specific for a given goal and using domain 
knowledge to guide the creation of abstractions. This 
paper presents only initial work on in this direction 
and much remains to be explored. 
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Research Summary 


My research is focussed on studying the notion of irrelevance in different 
areas of problem solving. I have developed a general framework in which 
one can study the issues concerning irrelevance and how it can be applied to 
problem solving. The framework has been applied to three different areas. 

The first is in the context of deductive databases [LS92]. There we have 
shown how analyzing the notion of irrelevance allows us to identify new kinds 
of redundancies in Datalog programs (Horn rules without function symbols), 
and we have given new algorithms for detecting such redundancies. 

The second is in work on abstractions. Much of the work in AI on creating 
abstraction hierarchies and problem reformulation is based on the intuition 
that we abstract a problem by removing irrelevant detail. [Lev92] shows how 
such irrelevance claims can be formalized in my framework. Doing so enables 
one to explicitly state justifications for abstractions, therefore enabling the 
system to reason about the abstraction hierarchies it creates for a given goal. 
In addition, algorithms for automatically deducing irrelevance can be used 
to automatically create abstractions. 

The framework has been applied to the problem of modeling physical 
devices [LIM92]. When a system chooses a model for a device given some 
task such as diagnosis, design or simulation, it needs to make decisions about 
which aspects of the device are relevant to the current task. We have shown 
how heuristics for guiding model selection can be stated as irrelevance claims 
in the language provided by the framework. We were able to express heuris- 
tics that have been discussed in the literature and to come up with new ones 
motivated by the vocabulary existing in the framework. 
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Abstract 

Our research is investigating abstraction of 
computational theories for scheduling and resource 
allocation. These theories are represented in a variant 
of first order predicate calculus, parameterized multi- 
sorted logic, that facilitates specification of large 
problems. A particular problem is conceptually stated 
as a set of ground sentences that are consistent with a 
quantified theory. We are mainly investigating the 
automated generation of aggregation abstractions and 
approximations in which detailed resource allocation 
constraints are replaced by constraints between 
• aggregate demand and capacity. We are also 
investigating the interaction of aggregation 
abstractions with the more thoroughly investigated 
abstractions of weakening operator preconditions. The 
purpose of the theories for aggregated demand/capacity 
is threefold: first, to answer queries about aggregate 
properties, such as gross feasibility; second, to reduce 
computational costs by using the solution of 
aggregate problems to guide the solution of detailed 
problems; and third, to facilitate reformulating theories 
to approximate problems for which there are efficient 
problem solving methods. We also describe novel 
methods for exploiting aggregation abstractions. 

Motivation 

Domain specific planning and scheduling systems have 
achieved a modicum of real world success, and current 
efforts are aimed at vastly increasing the size and 
complexity of problems which can be handled with 
knowledge-based technology. We believe that much of the 
power of domain-specific planning and scheduling systems 
comes from their use of specialized algorithms at different 
levels of abstraction. For example, a resource allocation 
problem can often be approximated as linearized upper and 
lower bounds at a high level of abstraction, and solved 
using linear programming methods in order to identify 
bottleneck resources. Domain-specific scheduling systems 
use many different kinds of abstraction, not just the 
abstraction hierarchies defined by dropping literals from 


operator preconditions, as is the case for ABSTRIPS and 
most of its progeny. In particular, for large scheduling and 
resource allocation problems whose computational 
complexity is characterized by resource contention between 
many separate tasks, aggregation abstractions of demand 
and resource capacity play a more dominant role than 
abstraction of operator preconditions. An example is to 
aggregate all transportation capacity into a single linear 
quantity - total cargo volume. However, the drawback of 
domain-specific systems is their lack of flexibility and the 
necessity of hand-coding the knowledge. 

The objective of our research is to develop the 
technology for dynamically 'compiling' domain-specific 
scheduling systems from declarative specifications and the 
subgoals and constraints that arise during planning and 
scheduling. The goal is to achieve the efficiency of hand- 
coded domain-specific systems but at the same time 
maintain the benefits of domain independent systems which 
interpret declarative problem specifications. The benefits of 
the latter arise from their generality: because the 
assumptions are explicit rather than hard-coded, the system 
is more widely applicable, the declarative representation is 
more transparent and thus more trusted and more easily 
validated, and furthermore the representation is more easily 
modified as requirements evolve. The automated synthesis 
and selection of abstractions is a key component to 
enabling domain-specific systems to be compiled from 
declarative specifications. 

The next section of this paper describes the underlying 
semantics we are using for abstractions, approximations, 
and aggregations. The subsequent section describes the 
techniques we are developing for generating abstractions and 
approximations. The final section describes new techniques 
for exploiting aggregation approximations and abstractions. 

Semantics of Abstractions, 
Approximations, and Aggregations 

Semantically, we define an abstraction as a (possibly 
partial) mapping from the models of one theory to the 
models of another theory. We assume that these mappings 
are transitive and reflexive. If a mapping is total and can be 
inverted, then the two theories it relates are isomorphic. 
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The intended semantics of a theory determine the 
appropriate constraints on a valid abstraction. We define the 
appropriate abstraction constraints through the converse 
mapping: implementation. For loose specifications, the 
intended semantics is any model satisfying the theory, 
hence a valid implementation is a mapping from the 
models of the implementing theory into (but not 
necessarily onto) the models of the loose specification. 
This is compatible with definitions of abstraction as theory 
generalization, i.e. a widening of the class of models. For 
this type of semantics, implementation is the same as 
theory refinement. 

For tight specifications, the intended semantics is a 
minimal model (up to isomorphism). Minimal model 
semantics correspond to the operational semantics of most 
types of logic programming. Minimal model semantics are 
also useful for succinctly axiomatizing models with 
inductive types, the simplest being the natural numbers. 
Hence the appropriate constraint on abstraction is a 
mapping from a single concrete model to a single abstract 
model. A third type of specification is parameterized , 
consisting of a parameter theory, which has many 
interpretations, and a body which extends the parameter 
theory. The usual intended semantics for this type of 
specification is to take all the models of the parameter 
theory, and then to extend each one with additional objects, 
functions and relations such that this extension is minimal 
with respect to all possible extensions consistent with the 
body. This third type of semantics is best seen as a tight 
specification for each model of the parameter theory. This 
type of semantics is most useful when a general 
specification is given for a whole class of problems. 

We have axiomatized various types of generic resources 
using parameterized specifications: consumable resources 
(such as fuel), reusable non-shareable resources (such as a 
landing strip), synchronized-shareable resources (such as a 
cargo ship), and independent-shareable resources (such as a 
parking lot). A particular domain theory is built up by 
composing instantiations of these generic parameterized 
resource theories with particular resources. 

Syntactically, an abstraction is defined through two 
theories and a set of definitions for abstraction functions 
from the objects and operations of the concrete model(s) to 
the objects and operations of the abstract model(s). The 
abstraction functions are defined in the syntax of the union 
of the abstract and concrete theories. 

Approximations arise from weakening or strengthening 
the criteria for models of a theory. In the context of our 
research, this weakening/strengthening is always with 
respect to queries or goals. For example, if the goal is to 
transport cargo from one country to another given a certain 
set of resources, then the satisfaction of a strengthened 
approximation guarantees the transportation feasibility of 
the original, while the non-satisfaction of a weakened 
approximation guarantees the transportation infeasibility of 
the original. Strengthening and weakening occur not only 
with respect to the truth of sentences but also with respect 
to any partial order, such as the total order on the reals 


(true/false defines a partial order on the boolean s). For 
example, approximations can also be upper and lower 
bounds on resources required for transportation feasibility. 
Given a complex query or goal it is necessary to map 
strengthening/weakening of the whole into 
strengthening/weakening of the parts. The polarity analysis 
for sentences [Manna & Waldinger 86] has been extended 
to a polarity analysis for any type of formula ranging over 
any domain with a partial order [Smith 92]. Thus given a 
complex query with a specified direction of strengthening or 
weakening, the constraints on the strengthening/weakening 
of the functions and relations in the query can be 
mechanically derived. 

Aggregations are mappings from collections of objects 
with their individual attributes to a whole representing the 
collection with attributes for the collective. In theory 
aggregations can arise as equivalent conditions for 
satisfaction of a goal. For example, in order for a chemical 
reaction to occur in a solution the individual molecules 
must have sufficient kinetic energy. This constraint on the 
attributes of individual molecules can be reformulated into 
an equivalent constraint on the temperature of the whole 
solution. In the context of the research reported in this 
paper aggregations are most often approximations with 
respect to a query or goal. 

Generation of Aggregation Abstractions 

We are using two techniques for automatic generation of 
aggregation approximations. The first is based on analysis 
of behavioral equivalence: given a goal, two objects are 
behaviorally equivalent if they can be mutually substituted 
for each other in the achievement of the goal. For example, 
for the goal of transporting a rifle division, two small cargo 
planes are behaviorally equivalent to one large cargo plane. 
However, for transporting a heavy armor division only the 
large cargo plane has a sufficient girth for tanks and heavy 
artillery. Thus for transporting heavy armor divisions two 
small cargo planes are not behaviorally equivalent to a 
single large cargo plane. This simple example illustrates 
that abstractions such as total lift capacity must be 
dependent on context in order to be useful. 

The result of behavioral equivalence analysis is the 
definition of an equivalence relation on the objects of a 
domain. In an abstract theory, behaviorally equivalent 
objects are identified. Syntactically, the equivalence relation 
in the concrete theory is transformed to an equality relation 
in the abstract theory. A number of issues arise in ensuring 
that the transformation from a behavioral equivalence 
relation to an equality relation is semantically well defined, 
particularly for inductively defined types. These issues are 
addressed in [Lowry 1989, Lowiy 1990], 

When behavioral equivalence analysis is applied to a set 
of goals, or to a complex domain theory with many 
constraints, the result will be a set of behavioral 
equivalences. (The behavioral equivalence for the 
conjunction of the goals is the intersection of the individual 
equivalence relations.) This set can be ordered by inclusion, 
defining a partial order on behavioral equivalence relations. 
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Upper and lower bounds exist for each pair of behavioral 
equivalence relations; a lattice is defined by including the 
universal equivalence relation (all objects equivalent) and 
the identity equivalence relation (each object is equivalent 
only to itself)- This lattice can be more densely ordered by 
inheriting ordering relations on the goals and constraints. 
For example. ABSTRIPS type orderings on literals in the 
preconditions of operators derived through various programs 
[Tenenberg 89; Knoblock 89, 90] can be used to order their 
corresponding behavioral equivalence relations. 

The second technique for generation of aggregation 
approximations is through the use of bounding 
approximations: given a goal or query, and an abstract 
domain theory obtained through behavioral equivalence 
analysis, an extended polarity analysis is applied to the 
goal(s) with respect to the abstract domain theory. Various 
kinds of symbolic bounding approximations are derived by 
KIDS through this polarity analysis, which is currently 
implemented through a transitive rewriting technique on 
formulas called directed inference [Smith 90]. These 
approximations include necessary conditions, sufficient 
conditions, symbolic upper bounds, and symbolic lower 
bounds. These approximation functions are composed with 
the abstraction functions derived through behavioral 
equivalence analysis to yield the abstraction 
(approximation) functions that map the concrete domain 
onto the abstract domain. 

To derive aggregation approximations, the domain theory 
must include generic axioms defining the relation between 
aggregate constraints and constraints on individuals. Most 
of the resources we are considering are linear: resources 
have time-varying capacities which at each instance cannot 
be exceeded by the sum of the consumers assigned to that 
resource. The axioms for these kinds of generic aggregation 
constraints, together with a particular domain theory, are 
transformed by the polarity analysis into definitions of 
possible aggregation approximations. 

Like the behavioral equivalence abstractions, the 
aggregation approximations derived through polarity 
analysis can be ordered by strength. Furthermore, directed 
inference, because it is a transitive rewriting technique, 
automatically generates part of this ordering relation. The 
composition of the lattice of behavioral equivalence 
relations and the ordering on aggregation approximations 
again yields a lattice. We are investigating techniques for 
this composed lattice to be implicitly defined rather than 
explicitly generated. 

Exploiting Resource Abstractions 

For very large resource allocation and scheduling problems 
that are solved interactively, some form of greedy algorithm 
is often appropriate. In this section, we illustrate the use 
of resource abstraction hierarchies to enhance the 
opportunities for finding a linear ordering of allocation and 
scheduling decisions that achieves good results within the 
context of a greedy algorithm. The same resource 
abstraction hierarchies can also be used to enhance the 


variable ordering heuristics used with other search 
strategies. 

The example described later in this section shows that 
resource abstractions can enable a successful linear ordering 
of resource allocation decisions where no such ordering 
exists without the abstractions. The resource abstractions 
allow allocation decisions (assignments of resources to 
operations) to be made in steps down the resource 
abstraction hierarchy; first an abstract resource is allocated 
to an operation, later, this decision is refined to a more 
specific resource. Each allocation of an abstract resource is 
essentially a reservation for an unspecified instance of that 
abstract group of resources. By making the reservation 
early while deferring more specific resource choices, the 
deferred decisions can take advantage of information that 
becomes available as other decisions are made. 

The cost of using resource abstraction hierarchies is the 
additional checking needed to maintain consistency of the 
allocation as it is built incrementally. Each allocation 
decision must be checked to ensure that it does not overuse 
the resource that is being assigned. With abstract 
resources, each decision must be checked not only against 
the resource being assigned but also against the more 
abstract resources. 

To formalize these concepts, assume we have a resource 
abstraction hierarchy in the form of a lattice (R, »=) where 
R is a set of resource types and >►= is the extends 
relation which is a partial ordering defined on R meaning 
“is more specific than (or equal to).” In the example 
discussed later in this section, the specific resources are 
seaports and R consists of individual seaports and selected 
sets of seaports. The lattice relation on R is membership 
or subset. For example, Norfolk >► tank-loading seaports >► 
East Coast seaports. Similarly, Baltimore » mid-Atlantic 
seaports. 

Given a set of specific resources, the power set of these 
resources is a candidate lattice; however, this is typically a 
bad candidate because it would involve exponential cost in 
checking resource assignments against the more abstract 
nodes. While a narrow lattice can be developed at design 
time; it is likely to be far more effective to choose 
abstractions that are tailored to the specific problem 
instance using the techniques discussed previously of 
behavioral equivalence analysis and aggregation 
approximations. 

Figure 1 depicts a small portion of a simplified problem 
involving military crisis action deployments. A large 
number of force modules (such as all the equipment 
associated with a brigade) are to be shipped through East 
Coast ports. Two of the hundreds of such force modules 
(FM15 and FM35) are identified in the figure. The 
problem addressed in this example is to plan the 
deployment without excessive congestion at the ports 
(other aspects of the overall problem involve scheduling of 
transports and other resources). 
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Figure 1: Problem of assigning seapons to transportation 
tasks 


The two force modules can be shipped through any of 4 
different ports, labeled B, N, S, and J. The dashed lines 
between the force modules and the ports that are suitable for 
shipment are labeled with the utility of using this route. 
That utility summarizes the current state of information 
about the cost of ground transportation to the port, the 
locations of available transport ships, characteristics of the 
ports, and other factors. 

If this port selection problem could be isolated from 
other aspects of the overall crisis action planning problem, 
it could be solved by existing OR algorithms. 
Unfortunately, there are many complex dependencies 
between the port selection and the scheduling of other 
resources that require the port selection decisions to be 
interleaved with other decisions within an overall multi' 
user, interactive construction paradigm. Within this 
context, the port selection is done with a greedy algorithm, 
and a goal is to use the locally available information to 
order the allocation decisions. 

This example focuses on deciding whether to choose a 
port for FM15 before or after choosing one for FM30. 
Without abstraction levels, neither ordering can be 
successful under all interactions with other demands on the 
port resources. If we use the simple greedy approach and 
assign to FM15 first because it can achieve a higher utility, 
then the arbitrary choice that has be made between B and N 
may prevent FM35 from obtaining its best choice. (We 
can't choose B knowing that FM35 prefers N because we 
are assuming there are many other force modules that are 
also competing for both B and N, and many of them may 
have higher priority than FM35.) On the other hand, if 
the assignment is made to FM35 first, it may choose port 
N, but that could be the cause of FM15 not getting either 
of its good choices. This problem of ordering these two 
allocation decisions (and the decisions about all other pairs 
of force modules) might be avoided by using statistical 
look-ahead techniques [Fox 89] that project the contention 


at the ports and may allow FM15 to choose first and make 
a non-arbitrary choice between B and N. Separating the 
decisions across abstraction level gives additional 
opportunities to be successful with a greedy algorithm, and 
appears to be especially useful in conjunction with 
statistical look-ahead techniques. Note that no technique 
can make a greedy algorithm successful all the time. 

Two abstractions on the seaports are shown in Figure 
1 — mid- Atlantic ports and M-l loading ports. For the 
abstractions to be effective, the utility function should 
often be more homogeneous across different instances of 
the abstract resource than across the entire domain of the 
resources. This can be a goal when creating abstractions 
dynamically. 

The mid-Atlantic port abstraction enables an ordering of 
the port allocation decisions for these two force modules 
that will almost always be successful: 

l) Reserve a mid-Atlantic port for FM15 (assuming it 
competes successfully with other force modules for these 
ports). 

1 ) Reserve an M-l loading port for FM35 (assuming it 
competes successfully with other force modules for these 
ports). 

3) Assign N to FM35 assuming it preserves all 
reservations for both mid- Atlantic and M-l loading ports 
and competes successfully for N with the other force 
modules. 

4) Assign to FM15 whichever instance of a mid- 
Atlantic port is left over. (The reservation guarantees that 
some mid-Atlantic port will be left for FM15) 

The reservation that FM 15 has for a mid- Atlantic port 
allows a lower priority force module to be given precedence 
over the higher priority module in step three — as long as 
the reservation is preserved. 

This four step ordering of the decisions about these two 
force modules often achieves a good solution without 
backtracking. Similar reasoning for other pairs of force 
modules can be used to order the other decisions made by a 
greedy algorithm — but there are no guarantees that a good 
ordering can be found for all pairs of decisions. 
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Abstract 

This paper describes a rational reconstruction of 
Einstein’s discovery of special relativity, validated 
through an implementation: the Erlanger program. 
Einstein’s discovery of special relativity 
revolutionized both the content of physics and the 
research strategy used by theoretical physicists. This 
research strategy entails a mutual bootstrapping 
process between a hypothesis space for biases, defined 
through different postulated symmetries of the 
universe, and a hypothesis space for physical theories. 
The invariance principle mutually constrains these two 
spaces. The invariance principle enables detecting 
when an evolving physical theory becomes 
inconsistent with its bias, and also when the biases for 
theories describing different phenomena are 
inconsistent. Structural properties of the invariance 
principle facilitate generating a new bias when an 
inconsistency is detected. After a new bias is 
generated, this principle facilitates reformulating the 
old, inconsistent theory by treating the latter as a 
limiting approximation. The structural properties of 
the invariance principle can be suitably generalized to 
other types of biases to enable primal-dual learning. 

Introduction 1 

Twentieth century physics has made spectacular 
progress toward a grand unified theory of the universe. 
This progress has been characterized by the development of 
unifying theories that are then subsumed under even more 
encompassing theories. Paradigm shifts are nearly routine, 
with the postulated ontology of the universe changing from 
the three dimensional absolute space of Newtonian physics, 
to the four dimensional space-time of relativistic physics, 
and through many other conceptual changes to current 
string theories embedded in ten dimensions. Theoretical 
physicists attribute much of the success of their discipline 
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to the research strategy first invented by Einstein for 
discovering the theory of relativity [Zee 86]. 

At the heart of Einstein’s strategy was the primacy of 
the principle of invariance: the laws of physics are the 
same in all frames of reference. This principle applies to 
reference frames in different orientations, displaced in 
time and space, and moreover to reference frames in 
relative motion. This principle also applies to many other 
aspects of physics, including symmetries in families of 
subatomic particles. The application of the invariance 
principle to “two systems of coordinates, in uniform 
motion of parallel translation relatively to each other’' was 
Einstein’s first postulate: the principle of special relativity 
[Einstein 1905]. 

Einstein’s genius lay in his strategy for using the 
invariance principle as a means of unifying Newtonian 
mechanics and Maxwell’s electrodynamics. This strategy 
of unifying different areas of physics through the 
invariance principle is responsible for many of the 
advances of theoretical physics. In the parlance of current 
machine learning theory, Einstein’s strategy was to 
combine the principle of special relativity with his second 
postulate, the constancy of the speed of light in a vacuum, 
to derive a new bias. (This second postulate > a 
consequence of Maxwell’s equations; [Einstein 1905 >:es 
that experimental attempts to attribute it to a light memum 
were unsuccessful.) This new bias was designed and 
verified to be consistent with Maxwell’s electrodynamics, 
but was inconsistent with Newton’s mechanics. Einstein 
then reformulated Newton’s mechanics to make them 
consistent with this new bias. He did this by treating 
Newton’s mechanics as a limiting approximation, from 
which the relativistic laws were derived through 
generalization by the new bias. 

Einstein’s strategy is a model for scientific discovery 
that addresses a fundamental paradox of machine learning 
theory: to converge on a theory from experimental 
evidence in non-exponential time, it is necessary to 
incorporate a strong bias [Valiant 84], but the stronger the 
bias the more likely the ‘correct’ theory is exclude , 
consideration. Certainly any conventional analysis i at 
could be learned in polynomial time would exclude a d 
unified theory of physics. The paradox can be avoic 
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machine learning algorithms that have capabilities for 
reasoning about and changing their bias. Even if a strong 
bias is ultimately ‘incorrect’, it is still possible to do a great 
deal of useful theory formation before the inconsistencies 
between the bias and empirical facts becomes a limiting 
factor. The success of the Galilean/Newtonian framework 
is an obvious example. To avoid the paradox, a machine 
learning algorithm needs to detect when a bias is 
inconsistent with empirical facts, derive a better bias, and 
then reformulate the results of learning in the incorrect bias 
space into the new bias space [Dietterich 91]. The Erlanger 
program described in this paper is such an algorithm. 

Einstein’s strategy is essentially a mutual bootstrapping 
process between two interrelated hypothesis spaces: a 
space for biases, and a space for physical theories. The 
invariance principle defines the space of biases; each bias 
is a different postulated set of symmetries of the universe, 
formalized through a group of transformations. The 
invariance principle also defines a consistency relationship 
that mutually constrains the bias space and the space for 
physical theories. The hypothesis space for biases has a 
rich lattice structure that facilitates generating a new bias 
when a shift of bias is necessary. The hypothesis space for 
physical theories has an approximation relation between 
theories (limit homomorphisms) that, after a shift in bias, 
facilitates generating a new theory from an old 
(approximate) theory and the new bias. The entire process 
converges if learning in the bias space converges. 

This paper builds upon the considerable body of 
literature on relativity and the role of symmetry in modem 
physics. Its contribution includes identifying and 
formalizing the structural relationships between the space 
of biases and the old and new theories that enabled 
Einstein’s strategy to succeed, in other words, made it 
computationally tractable. The tactics for carrying out the 
components of this strategy have been implemented in the 
Erlanger program, written in Mathematica v.1.2. 

The next section of this paper presents an overview of 
Einstein's strategy. The following section introduces the 
invariance principle, which determines the consistency 
relationship between a bias and a physical theory. It also 
describes the procedure for detecting inconsistency. The 
following section presents the tactic for computing a new 
bias using the invariance principle. It takes the reader 
through the Erlanger program’s derivation of the Lorentz 
transformations. The section after defines limit 
homomorphisms , a formal semantics for approximation. 
The following section describes BEGAT: BiasEd 
Generalization of Approximate Theories, an algorithm that 
uses the invariance principle and the semantics of limit 
homomorphisms to generate components of the new 
theory. The paper concludes with a generalization of 
Einstein's strategy called primal-dual learning, which 
might be applied to other types of biases. 


Overview of Einstein's Strategy 

Einstein’s strategy for deriving special relativity will 
first be explained through an analogy with symmetries and 
tangents of geometric figures. Then the structural 
components of the invariance principle interrelating the 
bias space and the space of physical theories will be 
outlined and the overall research strategy described with 
respect to these components. The next section will describe 
the mathematics of the invariance principle as it applies to 
theories of physics. 

Symmetry and Group Theory 

The symmetries of a geometric figure are invertible 
transformations that map the figure to itself. For example, a 
square is mapped to itself by various transformations about 
its center: horizontal reflections, vertical reflections, and 
ninety degree rotations. Because these transformations are 
invertible, they form a group. 

A group is any set with a constant identity element, a 
binary operation defined on any two elements, and an 
inverse operation mapping any element to its inverse. A 
transformation group consists of elements which are 
transformations of some other set S\ each transformation is 
a bijection from 5 to 5. A transformation T defined on S is 
an automorphism of a subset iff T(F) = F . Hence if 

5 is the two dimensional plane and F is a geometric figure 
such as a square, then the symmetries of F are those 
transformations T such that T(F) = F . Restrictions can be 
placed on the transformations considered; for example, 
transformations that preserve topological structure are 
called homeomorphisms while transformations that 
preserve distance are called isometries. The isometries of a 
square include horizontal reflections, vertical reflections, 
and multiples of ninety degree rotations about it center. 

Symmetries can be represented through transformation 
equations; for example, the equations for a rotation of 9 
degrees about the origin in two dimensions define new 
primed coordinates for each point in terms of the original 
coordinates: jc' = xcos0- ysinfl , y' = xsin0 + ycos0.1f 
9 is a constant, then these equations represent a single 
transformation. If 9 is a parameter, then these equations 
represent a set of transformations. Note that for any 9 , a 
circle with its center at the origin is mapped to itself. Hence 
these equations denote a set of automorphisms of all origin- 
centered circles. One way to prove this algebraically is to 

solve for the equation of a circle, i.e., reduce x 2 + y 2 = r 2 
to a set of functions for y in terms of x for different 
quadrants, plug the definitions of these functions into the 
transformation equations, and then show that the new 
points also satisfy these equations. 

The method implemented in the Erlanger program is 
slightly different because it is based upon an equivalent but 
alternative approach to defining symmetries. (See 
[Friedman 83] for a thorough analysis of the relation 
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between these two approaches, as applied to space-time 
theories.) Instead of viewing the transformations as 
mappings from points to points within a single reference 
frame, the transformations are viewed as mappings 
between reference frames. A figure is symmetric if it 
appears exactly the same in the new reference frame as it 
does in the old reference frame. In this alternative 
approach, the transformation equations are the same except 
that the sign of the parameter is inverted, because rotating 
the reference frame 9 about the origin is equivalent to 
rotating the figure by -9 about the origin: 
x' = Jtcos0 + ysin0 , y' = -xsintf + ycosfl. 


An Analogy to Einstein's Strategy 



Figure 1. 

Einstein's strategy for deriving special relativity is 
illustrated through the simple geometric analogy in Figure 
1. Newton's mechanics is represented by the circle on the 
right, its set of symmetries are all rotations and reflections 
about its center. This set of symmetries is inconsistent with 
the invariance of the speed of light, a deductive 
consequence f Maxwell's electrodynamics that is 
represented by the small bold circle on the left. 

Einstein derived the set of symmetries consistent with 
the constant speed of light by first generalizing from the 
particular circle representing Newton's mechanics to 
symmetries for all possible circles, i.e., rotations and 
reflections about all possible centers. He then specialized 
this set of all possible circular symmetries by solving for 
the center of the circle consistent with the constant speed of 
light. This new symmetry was verified to be consistent with 
Maxwell's electrodynamics. 

Einstein then derived relativistic mechanics, represented 
by the larger left circle, through two constraints: that it be 
circularly symmetric around the same center as 
electromagnetic phenomena, and that it be tangent to 
Newton's mechanics as relative velocities approach 0. Note 
that Newton's mechanics had only been empirically 
verified at low velocities compared to light; the rest of the 
circle was assumed from the originally postulated 
symmetries dating back to Galileo. In this manner Einstein 
unified electromagnetism and mechanics under the same 


set of symmetries while still accounting for the wealth rf 
experimental confirmations of Newton's theory at lo* 
velocities compared to the speed of light. Althoug: 
simplistic, this geometric analogy captures the essential 
extensional relationships between Newton's mechanics. 
Maxwell's electromagnetism, and relativistic mechanics. 

One of the crucial facts about symmetry as bias is that 
the groups corresponding to different figures form a lattice 
ordered by the subset relation. (More generally, the 
ordering is defined through group homomorphisms.) There 
is a contravariant relation between the complexity of an 
object and its set of symmetries. For example, a square is 
more complex than a circle, hence the group of 
transformations for a square is a subset of the group of 
transformations for a circle. As explained in the next 
section, this relation between geometric figures and their 
symmetries also holds between theories of physics and 
their symmetries. This contravariant relation is essential to 
the bootstrap learning of Einstein's strategy. 

Structural Relations Exploited in Einstein's 
Strategy 

Figure 2 illustrates the structural relations between the 
bias space and the space for physical theories that was 
exploited by Einstein, and indicates how these same 
structural relations might be exploited in other types of 
bias. 



Newtonian Relativistic 

Mechanics Mechanics 

Figure 2 

1. The diamond represents a space of biases for physical 
laws. The biases are different postulated symmetries of the 
universe. As modem physics has evolved, the bias has 
evolved. Each bias in this space is formalized as a 
transformation group. 

2. The consistency relationship between a b a 

transformation group) and a physical theory is repre ;d 
by a solid black line. The diagram illustrate: :at 
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Newtonian mechanics is consistent with the Galilean 
transformation group. 

3. When an inconsistency is detected between an 
experimental fact and the current bias, then a new bias is 
computed. The new bias is computed by combining the 
new observation, an upper bound (represented by a hollow 
circle) and lower bounds (represented by a solid black 
circle). The upper bound is a superset of transformations 
that constrains the types of transformations that are 
considered. The transformations in this superset that are 
consistent with the new observation are selected for the 
new bias. This selection is done by symbolically solving 
for those transformations that are consistent with the new 
observation, rather than enumerating over all the 
transformations in the upper bound. The calculation is 
simplified through the use of lower bounds. Einstein 
derived the Lorentz transformations through this procedure. 

4. Laws in the new hypothesis space are constrained to 
be consistent with the new bias and also to have, as a 
limiting approximation, the laws in the old hypothesis 
space. This limiting approximation is indicated by the 
arrow from relativistic mechanics to Newtonian mechanics. 
In fact, the new laws can often be derived from the old 
laws by using the new bias to reformulate the old laws. 
This was the method Einstein used to generate relativistic 
mechanics. 

The power of Einstein’s strategy is that his framework 
scales up from special relativity through the history of 
twentieth century physics, although the mathematics 
becomes considerably more complex. From the viewpoint 
of machine learning, the power of Einstein's strategy is his 
mutual bootstrapping between the bias space and the 
hypothesis spaces by exploiting the structural relationship 
between them: the invariance principle. 

Symmetry as Bias: the Invariance Principle 

Symmetry is a unifying aesthetic principle that has been 
a source of bias in physics since ancient times. In modern^ 
physics this principle is stated as: "the laws of physics are 
invariant for all observers.’ An invariance claim is a 
universally quantified statement of the form ‘For all 
events/histories of type F, for all reference frames of type 
/?, Physical Theory P holds’. An invariance claim implies 
that a group of transformations mapping measurements 
between different observers also maps physical theory P 
onto itself. Such a group of transformations defines the 
postulated symmetries of the universe, and is the type of 
bias used by theoretical physicists. The transformations are 
parameterized by the relation between two different 
observers, such as their relative orientation or velocity. For 
example, Galileo defined the following transformation 
equations relating measurements for observers in constant 
relative velocity v parallel to the x-axis: 
{x' = x - w, t' = t ) These transformations are consistent 

with Newton’s theory of mechanics. 

The invariance principle defines a consistency 
relationship between physical theories and groups of 


transformations. The following definitions are standard and 
sufficient for our purpose of understanding and 
implementing Einstein’s strategy for deriving special 
relativity. However, the reader should be aware that these 
definitions are a simple starting point for a deep, well 
developed mathematical theory that has had a profound 
impact on theoretical physics. (A good mathematical 
exposition focused on special relativity is [Aharoni 65], a 
more sophisticated philosophical and foundational 
treatment is [Friedman 83].) 

Below, g is a transformation group. An invariant 
operation is a special case of a covariant operation. Laws 
are invariant if they define the same relation after they are 
transformed by the action of the transformation group. A 
sufficient condition for a theory to be invariant with respect 
to a transformation group Q is if all the operations are 
covariant and all the laws are invariant. 

Invariance of an operation or form: 

Invariant^/*, £) <=> V(^ e Q f x\...x n ) 

Op(x i,*2 Xn) = Op(j(x 1,*2 *„)) 

Covariance of an operation or form: 

Covariant(<?p,£) «=> 

V(^ e g t xh,.x n ) 

op(s(x i , x 2 x * )) = s(op(x { ,x 2 )) 

«=> V^e^xl...^) 

op(x ! , x 2 x M ) = 9~ x {op(#(x x , x 2 , . . . , ))) 

Invariance of a physcial law expressed as a universally 
quantified equation: 

Invariant(V(...) tl(„.) = t2(...), Q) o 
V{g€g,x\...x n ) 

x n ) = t2(x l ,x 2 x n ) 

*t\(g(x u x 2 x„)) = l2(g(x l ,x 2 ,....x n )) 

More generally, a theory is invariant with respect to a 
transformation group g iff all the transformations in the 
group are automorphisms of the models of the theory. This 
is equivalent to proving that the theory and the 
transformation equations together imply the same theory in 
other frames of reference (though see [Friedman 83] for 
qualifications). 

Invariance of a theory T: 

Invariant(T,£) « 

V(^ € g) Models(T) = g(Models(T)) 
a V(g€£?)Tt= T / g andT/gt T 
where T / g denotes substituting variables with the 
terms defined by the transformation equations. 

Because of the inverse property of groups, 
the two conjunctions imply each other 
V(geg)Tt* T /g" 1 
implies V(geg)T / gt= (t/g'^/g 
implies V(^ eg)T / gN T 

To check an invariant predicate, the Erlanger program 
back-substitutes transformation equations into a form or 
law and then compares the result to the original form or 
law. If the function or relation are the same, then the 
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invariant predicate is true. In essence the Erlanger program 
assumes the law holds good in the original reference frame 
and then transforms the law into measurements that would 
be observed in a new frame of reference. (This can be done 
independent of whether the law is invariant.) If these 
measurements agree with the law stated in the new frame 
of reference, then the law is invariant. The steps of the 
algorithm are described below and illustrated with the 
example of determining whether the Galilean 
transformations are consistent with the constant speed of 
light, Einstein’s second postulate. The input is the 
definition of the law for the constant speed of light, and the 
transformation equations relating variables in the original 
frame of reference to the variables in the new (primed) 
frame of reference: 

Invariant!* 2 = c 2 r 2 , { x = x' + vf \ t = t'}) 

1. Solve the law in the new frame of reference to derive 
expressions for dependent variables (This turns a relation 
between variables into a disjunction of if 
substitutions.): {(•*' -ct': x ' = — cr'}} 

2. Use the parameterize a transformation equatii -o 
substitute expressions in the new frame of referent. >r 
variables in the old frame of reference: this yields a .-.-w 
law relating measurements in the new frame of reference: 

(x' + vf') 2 = cV 2 

3. The substitutions derived in step 1 are applied to the 
new law derived in step 2: 

((c/' + vf') 2 = cV 2 , (-«' + w*) 2 = c 2 /' 2 } 

4. If the law(s) derived in step 3 is a valid equality(ies), 
then the law(s) is invariant. For this example they are not, 
so the Erlanger program determines that Einstein's second 
postulate is inconsistent with the Galilean transformations. 

Deriving a New Bias 

The invariance principle can be used not only to verify 
that a physical law is consistent with a particular bias, out 
also to generate a new bias when a physical law is 
inconsistent with the current bias, as when the constant 
speed of light is inconsistent with the Galilean 
transformations. There are important structural aspects of 
the invariance principle that enabled this aspect of 
Einstein’s strategy to succeed. In particular, the consistency 
relationship is contravariant: a weaker physical theory is 
consistent with a larger set of transformations. (For the 
purposes of this paper, 'weaker* can be thought of as 'fewer 
deductive consequences', though this is not entirely correct. 
This only holds if each law transforms into itself.) Thus 
when an inconsistency is detected between a bias 
represented by a set of transformations and an evolving 
physical theory, the physical theory can be relaxed, leading 
to an enlarged set of transformations. This enlarged set is 
then filtered to compute the new bias. 

Assume that a physical theory T (e.g. Newton's 
mechanics) is consistent with a transformation group G 
(e.g. the Galilean group). Further assume that G is the 
largest transformation group consistent with T. Then a new 


empirical fact e is observed (e.g. the constant speed of 
light), such that e is not consistent with G. Then T is 
relaxed to T" (e.g. Newton’s first law), thereby enlarging G 
to G ' (e.g. the set of all linear transformations). The new 
bias is the subset of G', i.e. G"(e.g. the Lorentz group), 
such that T' with e is consistent with G Then the laws in 
(T - TO are transformed so that they are consistent with 
G " and have as limiting approximations the original laws. 
This section describes an implemented algorithm for 
deriving G" while the next sections describe transforming 
the laws in (T - TO. These same algorithms can also be 
used when trying to unify theories with different biases, 
such as Newton's mechanics and Maxwell's 
electromagnetism. 

The Lorentz group is a set of transformations that relate 
the measurements of observers in constant relative motion. 
The Lorentz group is a sibling to the Galilean group in the 
space of biases. Einstein’s derivation of the Lorentz 
transformations implicit relied upon structural properties 
of the lattice of transformation groups. In particular. 
Einstein constrained the form of the transformations with 
an upper bound, derived from Newton’s first law: a body in 
constant motion stays in constant motion in the absence of 
any force. This is his assumption of inertial reference 
frames, an assumption he relaxed in his theory of general 
relativity. The largest set of transformations consistent with 
Newton’s first law are the four dimensional linear 
transformations. Of these, the spatial rotations and 
spatial/temporal displacements can be factored out of the 
derivation, because they are already consistent with 
Einstein’s second postulate. (The Erlanger program does 
not currently have procedures implemented to factor out 
subgroups of transformations • these are under 
development.) This leaves an upper bound for a subgroup 
with three unknown parameters (a,df) whose independent 
parameter is the relative velocity (v): 

x = a(x'+vi') t^dx’ + ft' 

This upper bound includes both the Gal; .-n 
transformations and the Lorentz transformations, fie 
DeriveNewBias algorithm takes the definition of an upper 
bound, such as the one above, including lists of the 
unknown and independent parameters, a list of invariants, a 
list of background assumptions, and information on the 
group properties of the upper bound. When this algorithm 
is applied to Einstein’s second postulate of the constant 
speed of light, the derivation of the Lorentz transformations 
proceeds along roughly the same lines as that in Appendix 
1 of [Einstein 1916). This derivation and others are 
essentially a gradual accumulation of constraints on the 
unknown parameters of the transformations in the upper 
bound, until they can be solved exactly in terms of the 
independent parameter which defines the relation between 
two reference frames. The algorithm is described below, 
illustrated with the example of deriving the Lorentz 
transformations. 

The input in this example to the DeriveNewBias 
algorithm is the upper bound given above, two invariants 
for a pulse of light - one going forward in the x direction 
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and one going backwards in the x direction 
{x = ct, x = -cf} , the assumptions that the speed of light is 
not zero and that the relative velocity between reference 
frames is less than the speed of light, and information for 
computing the inverse of a transformation. The steps of the 
DeriveNewBias algorithm: 

1. Constraints on the unknown parameters for the 
transformation group are derived separately from each 
individual invariant. This step is similar to the procedure 
which checks whether a law is invariant under a 
transformation group. However, instead of steps 3 and 4 of 
that procedure, the system of equations from steps 1 and 2 
are jointly solved for constraints on the unknown 
parameters. For the two invariants for, a pulse of light, the 
derived constraints are: 

a = ( -c 2 d + cf) / (c - v), a = ( c 2 d + cf) / (c + v) 


2. The constraints from the separate invariants are 
combined through Mathematical s SOLVE function. In the 
example of the Lorentz derivation, this reduces the 
unknown parameters to a single unknown (/): 

a = /, d = (Jv)/c 2 

3. In the last step, the group properties are used to 
further constrain the unknown parameters. Currently the 
implemented algorithm only uses the inverse property of a 
group, but the compositional property is another source of 
constraints that could be exploited. First, the constraints on 
the unknown parameters are substituted into the upper 
bound transformation definition, yielding a more 
constrained set of transformations. For the Lorentz example 
this yields: 

x = f{x' + vt') I = ft' + fvx’ / c 2 


Second, the inverse transformations are computed. The 
information given to the algorithm on the group properties 
of the upper bound define how the independent parameter 
for the transformation is changed for the inverse 
transformation. For relative velocity, this relation is simply 
to negate the relative velocity vector . This then yields the 
inverse transformations: 


x' = f(x-vt) t' = fi-(fvx)/c 2 

The inverse transformations are then applied to the right 
hand side of the uninverted transformations, thereby 
deriving expressions for the identity transformation: 


* = /(/(* - vt) + v{ft - j 
' = /(/'- 


fix) ! M(x-Vt) 

7 “ ) P ~ 

These expressions are then solved for the remaining 
unknown parameters of the transformation (e.g./), whose 
solution is substituted back into the transformations: 




V C + V 


V c 2 yv2Vc~v c + v 


The result is the new bias, which in this example is 
equivalent to the standard definition of the Lorentz 
transformations (the definitions above are in Mathematical 
preferred normal form). 

Limit Homomorphisms: Approximations 
between Theories. 

Once a new bias is derived, a learning algorithm needs 
to transfer the results of learning in the old bias space into 
the new bias space. Unless the relationship between the old 
bias and the new bias can be exploited, in the worst case 
this means running the learning algorithm with the new 
bias over all the examples used to derive the old theory. 
The shift in bias from the Galilean transformation group to 
the Lorentz transformation group required a global 
reformulation of all the theories of physics, from 
kinematics to fluid dynamics, and later quantum 
mechanics. Yet in all these reformulations, the relativistic 
theory was derived from its non-relativistic counterpart 
without exhaustively considering the experimental 
evidence justifying the non-relativistic theory. This was 
done by treating the non-relativistic theory as an 
approximation to the new, unknown relativistic theory; and 
combining this constraint with the Lorentz transformations 
to derive a corresponding relativistic theory. 

A theory such as Newton’s mechanics that has a high 
degree of experimental validation over a range of 
phenomena (e.g. particles interacting at low velocities 
compared to the speed of light), represents a summary of 
many experimental facts. If a new theory is to account for 
these same experimental facts, it must agree with the 
observable predictions of the old theory over the same 
range of phenomena. Hence the old theory must 
approximate, to within experimental error, the new theory 
over this range of phenomena (and vice versa). By showing 
that an old theory is a limiting approximation to a new 
theory, it is unnecessary to exhaustively reconsider all the 
experimental evidence justifying the old theory. This 
approximation criteria for partially validating a new theory 
is well accepted, both within scientific communities and 
within the philosophy of science. However, the 
development of relativity theory went beyond a post-hoc 
verification of this approximation criteria: the 
approximation criteria was used to derive the new theory. 

Various notions of “approximation” have been 
developed in AI to support reasoning between approximate 
theories, and even generating approximate theories from 
detailed theories [Ellman 90,92]. The problem of 
generating a new theory from an approximate theory and a 
new bias requires a precise definition of approximation 
with a well defined semantics. This section describes limit 
homomorphisms , which are homomorphisms that only hold 
in the limiting value of some parameter. Limit 
homomorphisms can be viewed as an extension of fitting 
parameter approximations [Weld 92] with additional 
algebraic structure that adds the constraints needed to 
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derive the new theory, and not just model the 
approximation relation. 

A limit homomorphism is a map m from one domain to 
another such that for corresponding functions fi and /j he 
following equality converges as the limit expression c >es 
to the limiting value: 

Urn w(/ 1 (x l ...jrJ) = / 2 (w(.r 1 )...m(xJ) 

expr-*v«lue 

Another motivation for this definition of approximation 
is to resolve a fundamental disagreement between Kuhn's 
view of the paradigm shift from Newtonian to relativistic 
physics, and the view of most physicists. Most physicists 
agree with the logical positivists: Newtonian physics is a 
limiting approximation of Einstein's physics. Kuhn argues 
that this is spurious [Kuhn 62, pg. 102], because the 
corresponding concepts in the relativistic and Newtonian 
mechanics are different. A limit homomorphism combines 
a map between corresponding concepts and a limiting 
approximation, thus achieving a limiting approximation 
between two different conceptual domains 

A well known type of limit homorr nism within 
computer science is ft -order computati omplexity. 
For example, the ft -order computational olexity of a 
sequence of program statements is the max .r.j m of the ft - 
order computational complexity of the individual 
statements: 

lim ft(S 1 ;S 2 ;...5J = Max(ft(5 1 ),ft(S 2 ),...ft(S /t )) 

input-*— 

To determine the Q -order computational complexity of a 
program, this limit homomorphism is recursively applied to 
the definition of a program. Similarly, to determine the 
non-relativistic quantity corresponding to a relativistic 
quantity, the appropriate limit homomorphism is 
recursively applied to the definition of the relativistic 
quantity. 

Within physics, limit homomorphisms define the 
relationship between new, unified theories and the older 
theories they subsume. If the mapping function m is 
invertible, the ’ a limit homomorphism can be defined in 
the reverse di: :tion. The limit homomorphisms between 
Newton's me_nanics and different formulations of 
relativistic mechanics are invertible. Thus from an a priori , 
mathematical viewpoint neither Newtonian mechanics nor 
relativistic mechanics is intrinsically more general than the 
other • the mathematical relationship is symmetric; each is 
a limit homomorphism of the other. These theories agree 
on their predictions when velocities are low, but diverge as 
velocities approach the speed of light. Relativistic 
mechanics is a posteriori more general because its 
predictions agree with experimental facts for high 
velocities, hence the theory is more generally applicable. 
Relativistic mechanics is also extrinsically more general in 
the sense that its bias is consistent with electrodynamics, 
and hence relativistic mechanics and electrodynamics can 
be unified. 


BEGAT: (BiasEd Generalization of 
Approximate Theories) 

While the intrinsic mathematical relationship between 
Newtonian and relativistic physics is not one of 
generalization [Friedman 83], the process of generating 
relativistic mechanics from Newtonian mechanics is one of 
generalization. This section describes the mathematics 
justifying this process, and an implemented algorithm 
based on these mathematics that derives relativistic 
kinematics. Extensions currently undergoing 
implementation are described that will enable it to derive 
different formulations of relativistic dynamics. 

It is clear from a reading of [Einstein 1905] that 
Einstein derived relativistic mechanics from Newtonian 
mechanics, by treating the latter as a limiting 
approximation that was valid in low velocity reference 
frames and applying the Lorentz transformations in .er 
to generalize to the relativistic laws. For examp: in 

section 10. paragraph 2 of [Einstein 1905]; “If the elec ron 
is at rest at a given epoch, the motion of the electron endues 
in the next instant of time according to the equations 
[Newton’s equations of motion] ... as long as its motion is 
slow.” Einstein then generalized to the reladvistic equation 
of motion by applying the Lorentz transformations to 
Newton’s equations of motion. Einstein even constrained 
the laws of relativistic dynamics to have the same form as 
Newtonian dynamics. 

This point needs to be made because [Kuhn 62], which 
many in AI take as a definitive source on scientific 
revolutions, argues otherwise with respect to the genetic 
relationship between Newtonian and relativistic mechanics 
[Kuhn 62, pg. 103]: “Though an out-of-date theory can 
always be viewed as a special case of its up-to-date 
successor, it must be transformed for the purpose. And the 
transformation is one that can be undertaken only with the 
advantages of hindsight, the explicit guidance of the more 

recent theory cut it [the old theory] could not suffice for 

the guidance of research.” The first sentence is true, but the 
remaining part o t the paragraph is demonstrably false as 
applied to Einstein’s derivation of relativistic mechanics. 
As is clear from the selection of Einstein's paper in the 
preceding paragraph, Einstein not only used Newton's 
theory to guide his search for the proper relativistic laws, 
he transformed, with foresight, the old (Newtonian) laws to 
obtain the new (relativistic) laws. Few physicists or 
philosophers/historians of science currently subscribe to 
Kuhn’s interpretation. 

When both the old theory and the new theory comply 
with the invariance principle, then the difference in the 
biases will determine the limit point, i.e. the range of 
phenomena over which they must agree. The following 
mathematical sketch explains what this limit point must be, 
when the theories postulate the -ame number of 
dimensions. The two biases will share some subgroups in 
common (e.g. the spatial rotations; and differ in other 
subgroups (e.g. the subgroup for relative velocity). For the 
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subgroups that differ, the identity transformations will be 
the same. Hence the value of the parameter (e.g. relative 
velocity) that yields the identity transformation must be the 
limit point (e.g. 0 relative velocity). Furthermore, assuming 
that the transformations in the differing subgroups are a 
continuous and smooth function of their parameter! s), and 
that the functions in the respective theories are smooth and 
continuous, then the bounding epsilon-delta requirements 
for a limit are satisfied. 

Thus, given a new bias, the new theory must be derived 
so that it satisfies two constraints: the theory is invariant 
under the new bias, and the old theory is a limit 
homomorphism of the new theory. The limit 
homomorphisms between Newtonian physics and 
relativistic physics can be defined through the composition 
of tupling (or projections) that are invertible, with Lorentz 
transformations applied to the various entities of the theory. 
Because the Lorentz transformations are also invertible, the 
composition is invertible. In other words, the limit 
homomorphism is defined through a standard 
homomorphism at the limit point, which will be denoted h , 
and Lorentz transformations denoted g. 

The two constraints on the new theory, that it be 
invariant under the new bias and that it have as a limiting 
approximation the old theory, can be solved to generate the 
new theory when the limit homomorphism is invertible. 
The new theory and the limit homomorphism are derived in 
tandem. In essence, the- transformations in the new bias are 
used to 'rotate away’ from the limit point, as Einstein 
‘rotated’ a description of Newton’s equations for an 
electron initially at rest to reference frames in which it was 
not at rest. (Here ‘rotate’ means applying the 
transformations in the subgroups of the new bias not 
contained in the old bias, e.g. the Lorentz transformations.) 

For the operations of the new theory, these two 
constraints can often be directly combined as follows: 

1. New, unknown operation is covariant wrt new bias: 

Op(^(x, , x 2 , . . . , X n )) = ^(op( X, , X 2 X„ )) 

Equivalently: op(x,,x 2 ....,xJ = 5 -l (op(*(xi,jc 2 ,....x <l ))) 

2. New, unknown operation has limit homomorphism to old 
operation op': 

lim h(op(x t .x 2 *„)) = op’(h(x l \h(x 2 ) h(x n )) 

X l •♦«•**-* 
limit point 

Thus: op(x^x 2 ,...,x H ) = 

8~ x (h~ x (op'WjK*. ))Ms(x 2 )) h(g(x n ))))) 

where g(x x , x 2 x n ) = limit point 

In words, the new operation is obtained by : 

1. Finding a transformation g that takes its arguments to 
a reference frame where the old operation is valid 

2. Applying the inverse transformation to define the 
value of the new operation in the original reference frame. 

Applying BEGAT to derive the laws of the new theory 
is a similar two step process: first, a transformation is 
determined that takes the variables to a reference frame in 
which the old laws are valid, and then the inverse 


transformations are symbolically applied to the equations 
for the old laws. 

The algorithm is underconstrained, because of the 
interaction of the definition of the new (unknown) 
operation and the definition of the (unknown) 
homomorphism h. In parts of [Einstein 1905], Einstein 
assumes that h is the identity, for example in his derivation 
of the relativistic composition of velocities ( described 
below), and then derives an expression for the new 
operation. In other parts of [Einstein 1905], he assumes that 
the old operation and the new operation are identical, for 
example in his derivation of the relativistic equation of 
motion. In that derivation he kept the same form as the 
Newtonian equation (i.e. force = mass * acceleration) and 
then solved for a relativistic definition of inertial mass, and 
hence h. To his credit, Einstein recognized that he was 
making arbitrary choices [Einstein 1905 section 10, after 
definition of transverse mass]: “With a different definition 
of force and acceleration we should naturally obtain other 
values for the masses.” 

The following illustrates how the BEGAT algorithm 
works for a simple operation when h is the identity. 

Note that when h is the identity: op( x { , x 2 , . . . ♦ x n ) = 

where g(x { ,x 2 x n ) = limit point 

BEGAT takes as input the definition of the old 
operation, the list of transformations for the new bias, and a 
definition of the limit point. For the composition of 
velocities, the old operation is simply the addition of 
velocities: 

Newton - Compose(vl, v2) = vl + v2 where: 
vl is the velocity of reference frame w.r.t. R 0 
v2 is the velocity of object A w.r.t. reference frame R, 
and the output is defined in reference frame R 0 


The transformations are the Lorentz transformations 
derived earlier. The limit point is when R t is the same as 

R 0 , i.e. vl = 0. The first part of the reasoning for the 

BEGAT algorithm is at the meta-level, so it is necessary to 
understand some aspects of the notation used in the 
Erlanger program. Variables are represented by an 
uninterpreted function of the form: 

varfevent, component, reference-frame]. This form 
facilitates pattern matching. Transformations have 
representations both as lists of substitutions and as a meta- 
level predicate of the form: 

Transform[start-frame, end-frame, independent- 
parameter] The independent parameter for relative velocity 
has the form: varfend-framejelvelocity .start-frame]. Thus 
vl is represented as vartRj.relvelocity.Rg] and v2 as 

var[A,velocity,R I ]. 

1. BEGAT first solves for g, the transformation which 
takes the arguments to the limit point. This transformation 
maps the reference frame for the output to the reference 
frame for the limit point. The result is obtained by applying 
a set of rewrite rules at the meta-level: 

Transform [R 0 ^j ,var[R 0 jelvelocity,R 1 ]] 
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This transformation maps reference frame R fl to 
reference frame R . 

2. BEGAT next solves for the value of the variables 
which are given to the old operation, i.e. ^vl), g(v2'). For 
g(v\) it symbolically solves at the meta-level for 

Apply[Transform[R 0 jl 1 ,var[R 0> relvelocityJl 1 ]], 

var[R r relvelocity, R Q ]], 

obtaining var[R,,relvelocity,R 1 ], i.e. ^(vl)=0 

For^v2) it symbolically solves at the meta-level for 
Apply[Transform[R 0 ,R 1 ,var[R 0 ,relvelocity,R 1 ]], 

var[A,velocity, 

R,ll. 

obtaining var[A,velocity,R 1 ], i.e. ^(v2)=v2 since v2 is 
measured in R,. 

This meta-level reasoning about the application of 
transformations is necessary when the input variables and 
the output variables are defined in different reference 
frames. 

3. BEGAT next symbolically applies the old operation 
to the transformed variables: 

Newton - compose^ vl) ?< ^ v2)) = 0 + v2 = v2 

4. BEGAT finally applies the inverse transformation to 
this result to obtain the definition for the relativistic 
operation: Relativistic-composef v 1 ,v2) = 

Apply [Transform [R 1 .R 0 .var[R 1 j-elvelocityJt () ]], 

var[ A, velocity, R,]] 

The transformation derived previously for velocities is 
now applied to var[A,velocity, Rj], yielding the definition 
of the operator for relativistic composition of velocities: so 
BEGAT calls DeriveCompositeTransformation with the 
definition for velocity = and the Lorentz 
Transformations for the components of the definition of 
velocity - namely the transformations for the x co-ordinate 
and the time co-ordinate derived earlier. 
DeriveCompositeTransformation then symbolically applies 
these transformations to the components of the definition, 
and then calls Mathematica’s SOLVE operation to 
eliminate the Ax, At components from the resulting 
expression. The result is the same definition as Einstein 
obtained in section 5 of [Einstein 1905]: 

Relativistic - composef vl, v2) = (vl + v2) / (1 + ( vlv2) / c 2 ) 


Deriving Relativistic Dynamics 

This subsection describes how the invariance principle 
can be used to derive other components of the new theory 
and the limit homomorphism, illustrated with one 
derivation of relativistic dynamics. Different background 
assumptions lead to different limit homomorphisms m and 
different formulations of the equations for relativistic 
dynamics. In his original paper, Einstein reformulated the 
Newtonian equation by measuring the force in the 
reference frame of the moving object and the inertial mass 


and acceleration in the reference frame of the observer. (In 
essence, Einstein did not complete step 2, for reasons too 
complex to explain here.) This leads to a projection of the 
Newtonian mass into separate transverse and longitudinal 
relativistic masses. 

A subsequent formulation of relativistic dynamics 
consistently measures masses, accelerations, momentum 
and energy in the reference frame of the observer, resulting 
in a single relativistic mass that varies with the speed of the 
object. In this formulation the mass of a system is the sum 
of the masses of its components, and is conserved in elastic 
collisions. The modern formulation of relativistic 
dynamics, based on Minkowski's space-time and Einstein's 
tensor calculus, requires that components that transform 
into each other be tupled together. Thus because time 
coordinates transform into spatial coordinates, time and 
space are tupled into a single 4-vector. Consequently 
energy and momentum are also tupled together. In this case 
m maps Newtonian inertial mass to rest mass, and maps 
Newtonian acceleration and forces to their 4-vector 
counterparts. 

In all three cases the derivation strategy is based directly 
on the invariance principle and the principle that the non- 
relativistic theory be a limiting approximation to the 
relativistic theory. The strategy is to assume that the laws 
of dynamics are invariant under the Lorentz 
transformations, and then to solve for the limit 
homomorphism that makes them invariant. (If it is not 
possible to consistently solve for the limit homomorphism, 
then the theory cannot be invariant.) These limit 
homomorphisms are composed of two maps: first a tupling 
or projection map from the components of the original 
theory to components of the new theory (# ), and second 
of Lorentz transformations for components of the new 
theory^). These two maps are generated by the 
derivations. 

Derivations based on the tensor calculus are the most 
elegant because the tensor calculus is essentially a syntactic 
encoding c: he invariance principle, as applied to biases 
defined by .’roups of linear homogenous transformations. 
However, an explanation of the group-theoretic basis of the 
tensor calculus is beyond the scope of this paper. Instead 
we will describe the justification and strategy that applies 
to the first two derivations of relativistic dynamics, and 
then illustrate it with part of Einstein's original derivation. 
This derivation has been partially simulated in interactive 
mode with Mathematica 1.2. The justification and steps of 
this derivation are also the same as that for relativistic 
electrodynamics; more specifically, the derivation of the 
Lorentz transformations for electric and magnetic fields. 

Recall the definition of the invariance of a theory under 
a transformation group Q , where is the new theory: 

Invariant(^T,^) » V(^e^)^*T/gfc= 

This is combined with the constraint that the old theory 
is a limit homomorphism of the new theory, where Ltf is 
the definition of the components of the old theory in terms 
of the components of the new theory: 
or 


When the limit homomorphism is invertible, we also 
have: 

o-ru/£Wt= jy*r 

Because this inverse limit homomorphism can be 
factored into a tupling/projection map X and the new bias 
g , this last constraint can be combined directly with the 
invariance principle to yield a single constraint between the 
old theory and the new theory. : 

V(je$)(OTutf)/*t» ^T 

By the definition of a limit homomorphism, the old 
theory is defined with respect to the reference frame for 
which the limiting value holds (e.g. zero relative velocity). 
The transformations in Q take the result of applying the 
tupling/projection map X to this reference frame and 
transform it to all other reference frames. The constraint is 
satisfied when the new theory, defined with respect to any 
reference frame 3(., is a consequence of the old theory, the 
tupling/projection map #, and the transformation g from 
the reference frame for the old theory to the reference 
frame We will now show how this constraint can be 
used to derive the new theory, illustrated with Einstein's 
derivation of relativistic dynamics. 

In all derivations of relativistic dynamics, it is assumed 
that the new equation has the same form as the Newtonian 
equation, but that the definition of the components might 
be different; according to # and g . Thus if X and g are 

partially known, say H' and g' are defined for some of the 
components, then the remaining parts of X and gare 
derived by setting up the following unified constraint and 
solving for the remaining parts of the limit homomorphism; 

V(^e^)(OTu *")/*> or' 


where 07" has the same form as the Newtonian theory 
but with new variables which are functions of 
corresponding variables in 07" and the parameters of the 
transformation group Q . 

Einstein's derivation of relativistic dynamics proceeded 
as follows. First, the old theory (07 - ) was Newton's 
dynamics relating a particle's inertial mass, acceleration, 
and the force exerted upon the particle (Einstein considered 
the case where the force was exerted by an electric field 
with a particle of charge e). This law is valid in the 
reference frame of the particle: 
d 2 z 


OT 3 m 


d 2 x 


= e£ r 


m 


d 2 y _ 


= £E. 


m- 


• = eE z 


dr x dt 1 7 dt* 

Through previous derivations, and g' were known 
for space, time, and electromagnetic fields; though Einstein 
did not use the transformations for the electromagnetic 
field. The map for space and time was the identity, 
while g' was the Lorentz transformation equations for 
space and time generated by DeriveNewBias. (A different 
background assumption where 9C tuples space and time 
into a single 4-vector would yield the tensor formulation of 
relativistic dynamics). Thus Einstein needed to solve for 
the relativistic definition of inertial mass as a function of 
the non-relativistic mass and the parameter of the Lorentz 


transformation group; namely, the relative velocity 
between reference frames. Because the relative velocity is a 
vector quantity with x,y,z components; the definition of the 
inertial mass is also set up with x,y,z components. These 
components of the inertial mass might later be identified. In 
the following, v is the relative velocity between the 
reference frame of the particle and an observer moving in 
the positive x direction, and p is a term defined with 


respect to the magnitude of v: /? = 




The 


unprimed variables are in the reference frame of the 
particle, while the primed variables are in the reference 
frame of the observer. The constraint relating Newton’s 
dynamics, the Lorentz transformations, and relativistic 
dynamics is instantiated from the unified constraint above: 


Vv or u 


x = P(x' + vt') 


y = y 




m' x (m,v) 

m'(m,v) 


d 2 x' 

Hi* 

d 2 / _ 


dt' 2 

d 2 z' 


= £E„ 


«;(«.' v )^r = f£ * 


Note that Einstein defines the force in the reference frame 
of the particle, even on the right hand side. The equations 
for Newton’s dynamics are then partially transformed into 
the reference frame of the observer by applying the Lorentz 
transformations, yielding a simplified constraint: 


mp 

mfi 

mp 


3 d 2 x' 
id 2 / 


dt ' 2 

2 dV 

dt ' 2 


= eE x 

= CEy 


h 


d 2 x' 

xK > dt ,z x 

d 2 y' 

m y( m ’ v )-jpr = eE > 

d 2 z' 

>n' t (m,v)-^p T = eE t 


This constraint is then solved for definitions of the 
relativistic inertial mass in terms of the Newtonian inertial 
mass and the parameter between the reference frame of the 
particle and the observer. Solving this constraint is a simple 
directed inference problem [Smith 91]; reasoning 
backwards from the right hand side a match is derived 
between the variables for the relativistic inertial mass and 
terms on the left hand side: 
m’ x (m,v) = mp 3 
m' y (m, v)-mp 2 
m'(m,v) = m/J 2 

The definitions for the y and z components of the 
inertial mass are identical, so they can be combined into a 
single 'transverse' inertial mass. In alternative derivations 
of relativistic dynamics, all the components of the inertial 
mass are identical. 

While the particular derivation tactics currently 
implemented or undergoing implementation in the BEGAT 
algorithm might not be directly applicable to other types of 
biases, it is likely that analogues can be found. Research 
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toward generalizing BEGAT is described after a review of 
related work. 


Related Research 

Within AI, this research is related to scientific discovery 
and theory formation [Shrager and Langley 90]* qualitative 
physics [Weld and de Kleer 90], change of bias in machine 
learning [Benjamin 90a], and use of group theory 
[Benjamin 90b]. The research in this paper appears to be 
the first addressing the automated rediscovery of scientific 
revolutions of twentieth century theoretical physics. Most 
of the work in scientific theory formation has been on 
incremental theory revision (normal science). Previous 
research on scientific revolutions includes conceptual and 
qualitative accounts of the geological revolution in plate 
tectonics [Thagard and Nowak 90] and the chemical 
revolution of the oxygen theory [O'Rork. torris, and 
Schulenburg 90]. Recently, [Thagard 92| . addressed 

automating the comparison of competing :ories, and 
applied it to comparing Einstein’s relativity uieories with 
competing theories. 

The notions of approximation within qualitative physics 
are closely related to limit homomorphisms. The well 
known calculii for qualitative physics reasoning usually 
include some sort of homomorphism from the reals [Forbus 
84] [Kuipers 86]. The use of limits (fitting parameters) to 
define approximation relations between models is 
described in [Weld 89]. Within machine learning, research 
on declarative representations and reasoning about bias is 
most important, see the collection of papers in [Benjamin 
90a]. The research described in this paper is one approach 
to addressing an open problem presented in [Dietterich 91]: 
analytically comparing biases. The declarative bias used in 
theoretical physics is group theory. A good collection of 
papers, many of which focus on the use of group theory in 
AI reasoning and problem solving, is in the workshop 
proceedings [Benjamin 90b]. 

The mathematical model and the research strategy 
presented in this paper are consistent with the physics 
literature. References accessible to the layman include [Zee 
86] and [Davies and Brown 88]. With respect to that 
literature the chief innovations of this paper are the result 
of focusing on the structure of derivations with the aim of 
formalizing them. This focus is peculiar to AI; to the best 
of my knowledge it has not been addressed before. The 
closest previous works may be various pedagogical 
explanations found in textbooks such as [Skinner 82], 
[Taylor and Wheeler 66], and [French 68]. 


Conclusion: Toward PrimaNDual Learning 

A hypothesis of this research is that Einstein's strategy 
for mutually bootstrapping between a space of biases and a 
space of theories has wider applicability than theoretical 
physics. Below we generalize the structural relationships of 
the invariance principle which enabled the computational 


steps of Einstein's derivation to succeed. We conjecture 
that there is a class of primal-dual learning algorithms 
based on this structure that have similar computational 
properties to primal-dual optimization algorithms that 
incrementally converge on an optimal value by alternating 
updates between a primal space and a dual space. 

Let ® be a set of biases with ordering relation < that 
forms a lattice. Let T be a set of theories with ordering 
relation < that forms a lattice. Let Cbe a consistency 
relation on 'BxT such that: 

C{bj) and b'<b^>C{b\t) 

C(b,t) and t' < t C(b y t') 

This definition is the essential property for a well- 
structured bias space: As a bias is strengthened, the set of 
theories it is consistent with decreases; as a theory is 
strengthened, the biases it is consistent with decreases. 
Hence C defines a contravariant relation between the 
ordenrg w biase- md the ordering on theories. 

Lv be the * a bias function from T — > ‘B such that 

C; >./)and ^C(bj) => b Let 2) be a function 

from 5 x T — * 3’ such that 'D(bj) = b a ?!(/), where a is 
the lattice meet operation. 

2> is the DeriveNewBias function, which takes an 
upper bound on a bias and filters it with a (new) theory or 
observation to obtain a weaker bias. (For some applications 
of primal-dual learning, D should take a lower bound on a 
bias and filter it with a new theory or observation to obtain 
a stronger bias. ) 2> is well-defined whenever and c 
have the properties described above. However, depending 
on the type of bias, it might or might not be computable. If 
it is computable, then it defines the bootstrapping from the 
theory space to the bias space when an inconsistency is 
detected. 

The bootstrapping of BEGAT from a new bias to a 
new theory that has a limiting approximation to the old 
theory requires two capabilities. First, given the old bias 
and the new sibling bias, the restriction of the old theory to 
those instances compatible with the new bias must be 
defined and computable. Second, given this restriction, its 
generalization by the new bias must also be defined and 
computable. 

As an example of BEGAT with a different type of 
bias, consider the problem of learning to predict a person's 
native language from attributes available in a data base. We 
will assume that one’s native language is the same as the 
language spoken by one’s mother, but that the mother's 
language is not in the data base. A declarative 
representation for biases that includes functional 
dependencies was presented in [Davies and Russell 87] and 
subsequent work. Let the original bias be that the native 
language is a function of the birth place. This bias would 
likely be consistent with data from Europe, but might be 
inconsistent with me data from the U.S. because of its large 
immigrant popuiir ion. Assume that a function 2) derives a 
new bias where me native language is a function of the 
mother's place of origin. The following limit 
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homomorphism formalizes the intersection of the original 
bias and the new bias: 

lim mother's - origin(jt) = birth - place(*) 

# immigrants — * 0 

The restriction of the original theory to concepts 
derived from the limiting value (e.g. non-immigrant data) is 
compatible with this new bias. Furthermore, the concepts 
learned from this restricted set can be transferred directly to 
the new theory by substituting the value of the birth place 
attribute into the value for the mother's place of origin. 

Future research will explore the theory and application 
of primal-dual learning to theoretical physics and other 
domains. Given the spectacular progress of twentieth 
century physics, based on the legacy of Einstein's research 
strategy, the computational advantages of machine learning 
algorithms using this strategy might be considerable. 
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Abstract 

One way to build a reactive system is to construct an action table 
indexed by the current situation or stimulus. The action table de- 
scribes what course of action to pursue for each situation or stim- 
ulus. This paper describes an incremental approach to construct- 
ing the action table through achieving goals with a hierarchical 
search system. These hierarchies are generated with transforma- 
tions called concretization * , which add constraints to a problem 
and which can reduce the search space. The basic idea is that an 
action for a state is looked up in the action table and executed 
whenever the action table has an entry for that state; otherwise, 
a path is found to the nearest (cost- wise in a graph with cost- 
weighted arcs) state that has a mappring from a state in the next 
highest hierarchy. For each state along the solution path, the suc- 
cessor state in the path is cached in the action table entry for that 
state. Without caching, the hierarchical search system can loga- 
rithmically reduce search. When the table is complete the system 
no longer searches: it simply reacts by proceeding to the state 
listed in the table for each state. Since the cached information 
is specific only to the nearest state in the next highest hierarchy 
and not the goal, inter-goal transfer of reactivity is possible. To 
illustrate our approach, we show how an implemented hierarchical 
search system can completely reactive. 

1 Introduction and Motivation 

Intelligent interaction with the world can be viewed as 
a combination of planning to achieve some goal and of 
reaction to external stimuli in the course of executing 
a plan. A pure planning system produces a c -olete 
plan of actions before executing it [4, 8, 3, ;]. In 
contrast, a pure reactive system quickly selects and ex- 
ecutes a single action based on an external stimulus [2, 
1, 6]. Planning systems appear to work well when the 
predictability of the world is precisely captured in the 
planner’s actions, whereas reactive systems appear to 
work well in worlds that are fraught with uncertainty 
or unpredictability — where plans have little chance of 
succeeding in their entirety, where the ability to plan to 
completion is not a virtue. This paper describes how a 
planning system can incrementally become more reac- 
tive through interaction with its world, By becoming 
more reactive, the system reduces its decision-making 
time. 

Previous approaches to building reactive systems 
from non-reactive ones include compilation and learn- 
ing from examples. Firby [5] and Rosenschein [13] 
show how to compile high-level input descriptions of 


actions and goals into reactive systems. Similarly, 
Rosenschein and Kaelbling describe a technique to 
compile constraint expressions into directly executable 
circuits for a robotic control system [14]. Mitchell 
uses Explanation-Based Learning to incrementally 
learn the general conditions under which a particu- 
lar action, which helps achieve a particular planning 
goal, should be applied [10]. If the conditions are 
matched, the same action is applied — irrespective of 
the system’s current goal. The advantage of learning 
over compiling is that examples focus on those parts 
of the environment with which an intelligent agent ac- 
tually interacts; only those actions that are relevant to 
that interaction are compiled for reactivity. 

The problem with the Explanation-Based Learning 
approach is that multiple goals can lead to multiple 
action suggestions for the same state, which results in 
deliberation as to which action to apply and therefore 
less reactivity. This anomaly is commonly called 
the wandering bottleneck problem in the machine 
learning literature; as a result of eliminating one time 
bottleneck (e.g. time taken to react) another one 
unexpectedly arises (e.g. time taken to decide how to 
react). More precisely, in a problem with n problem- 
solving states, each state can have as many as n 
possible action suggestions since there can be as many 
as n goals from which the action suggestions arc 
learned. Moreover, to store such a network of states 
and actions can require as much as 0(n 2 log n ) space 
over n goals and n states, since O(logn) space is 
required to store each action suggestion. If n is an 
exponential of problem size, then this approach is 
generally not feasible. 

This paper describes a technique to avoid the wan- 
dering bottleneck problem by hierarchically organiz- 
ing the state space such that at most one action is 
learned for each state. As a side-benefit, this hierar- 
chical organization reduces the worst-case space r 
quirements by a factor of n. 

The rest of this paper is organized as follows. Sec- 
tion 2 defines the notion of a concretization and de- 
rives several important properties of concretizations in 
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search. Section 3 describes our approach to becoming 
reactive by concretization. Section 4 presents experi- 
mental results of applying our approach. Finally, Sec- 
tion 5 summarizes the conclusions of this work and 
discusses a few promising avenues of future research. 

2 Concretizations 

Intuitively, a concretization of a problem is one that 
has added constraints. The importance of these added 
constraints is that they reduce the branching factor dur- 
ing search. To formalize this intuitive notion requires 
a definition of search. The definition that we will as- 
sume is standard in the AI literature [11]. A search 
problem can be thought of as consisting of a graph of 
nodes, which represent states, and directed arcs that 
represent the application of an operator. These arcs 
are typically weighted to represent the cost of apply- 
ing the corresponding operator. Search can be thought 
of as finding a finite path in the graph from a node rep- 
resenting a given initial state to a node representing a 
given goal state. The graph can be specified explicitly 
or implicitly. In an explicit specification, the nodes 
and arcs with associated costs might be supplied in 
a table that includes every node in the graph and a 
list of its successors and the costs of associated arcs. 
This information might also be specified by a matrix 
that stores the costs of associated arcs for every pair 
of nodes (an infinite cost arc represents the absence of 
an arc). In an implicit specification, only that portion 
of the graph that is sufficient to include a goal node 
is made explicit by applying operators using a search 
algorithm such as A" [11]. For example, in the Eight 
Puzzle problem, the set of states consists of all tile 
permutations and operators only allow swapping the 
blank with an adjacent tile (i.e. the cost function on a 
pair of states returns 1 if one state is reachable from 
the other by swapping the blank with an adjacent tile, 
and oo otherwise). The goal state might specify that 
the dies are in a particular order. 

More formally, let a search problem be a 3-tuple 
(5, c), where 5 is a set of states describing situations 
of the world; c : 5 x 5 -» is a positive cost function 
that represents the cost of applying the corresponding 
action from one state to another, and G C S is a set of 
goal states. An instance of a search problem includes 
a 2-tuple (i,g) where i € 5 is the initial state and 
g 6 5 is the goal state (for simplicity, we assume that 
there is only one goal state). The objective is to find 
a finite length finite cost path from i to g. 


A problem { S',c ') is a concretization of another 
problem (S,c) with respect to <f> : S' -* S iff <t> 
reduces cost: (Vs 1 ,? 6 <p(t')) < c(s',t'). 

For example, Figure 1 shows a concretization of 
the Towers of Hanoi problem. The original problem 
is composed of operators that stack smaller disks on 
top of larger disks from pin to pin; states are simply 
disks stacked in increasing size on various pins. The 
initial and goal states for a typical three disk instance 
of the Towers of Hanoi problem are also shown in 
the figure. If the disks are numbered from top to bot- 
tom and then the operators are constrained such that 
they never place an odd-numbered disk on an even- 
numbered disk and vice versa, then this new problem 
is concretization of the original problem with respect 
to a mapping function that ignores disk parity. The 
reason is because the cost is reduced: operators apply 
more often in the original problem. Notice that any so- 
lution in the concrete space is guaranteed to be a solu- 
tion in the original space because the concretized prob- 
lem is more restricted. Since the branching factor will 
be lower for the concretized problem, solution gen- 
eration will be more efficient (though slightly longer 
solutions will generally result). This property, which 
we call solution-soundness , is perhaps most powerful 
when a problem can be concretized into one for which 
an efficient solution generator exists. Any solution to 
the concretized problem can then be directly mapped 
onto a solution to the original problem. For example, 
a Blocks World problem with three table locations can 
be concretized into a Towers of Hanoi problem, which 
has an associated divide-and-conquer algorithm, by as- 
signing a ‘‘size” to each block (say, small to large for 
each block on every stack, consistent in the initial and 
goal states). Any solution to the corresponding Tow- 
ers of Hanoi problem can be mapped onto a solution 
to the original problem simply by ignoring size. 

Tenenbeig describes a similar property, which he 
calls the downward solution property, in the context 
of planning with a certain type of operator representa- 
tion [16]. In his terminology, a transformed problem 
has the downward solution property if every solution 
in the transformed space can be mapped onto one for 
the original problem. Solution soundness is a gener- 
alization of the downward solution property since it 
does not depend on specific operator representations. 

Despite the solution-soundness property of con- 
cretizations, a solvable problem in the original space 
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Figure 4 Becoming Reactive Through Interaction with the World 


only those states that are most frequently encountered 
or apply learning techniques to reduce table size. In 
particular, we are currently investigating applying our 
ideas to a less artificial problem (a robot routing task), 
which includes explicitly specified operators with in- 
puts from external sensors such as in a robot routing 
task. It might be possible to apply Explanation-Based 
or inductive learning to leant the class of states hat 
lead to the nearest state in the next highest hierarchy. 

Another problem is that constructing concretization 
hierarchies is generally a difficult problem. However, 
a catalog of problem transformations such as those of 
Absolver II [12] might prove helpful. Another method 
might be to use clustering algorithms to group simi- 
lar states into equivalence classes. Problem-solving 
performance with more meaningful groupings — those 
that exploit the structure of the search graph and sim- 
ilarity of states — should be improved over the results 
we obtained with random hierarchical groupings. 

Ultimately, we would like to test our ideas in a 
dynamic world where an intelligent agent’s plans to 
achieve goals are continually thwarted by unforeseen 
events to which the agent has to react immediately, 
recover, and then proceed towards achieving the goal. 
We believe that a hierarchical learning system of the 
sort described here may be especially suited for such 
worlds. We are currently modeling a dynamic world 


and testing this hypothesis. 
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Abstract 

KBSDE is a knowledge compiler that uses a 
classification-based approach to map solution con- 
straints in a task specification onto particular 
search algorithm components that will be respon- 
sible for satisfying those constraints local 

constraints are incorporated in genera t:*: global 
constraints are incorporated in either »sters or 
hillclimbing patchers). Associated with each type 
of search algorithm component is a subcompiler 
that specializes in mapping constraints into com- 
ponents of that type. Each of these subcompil- 
ers in turn uses a classification- based approach, 
matching a constraint passed to it against one of 
several schemas, and applying a compilation tech- 
nique associated with that schema. 

While much progress has occurred in our research 
since we first laid out our classification-based ap- 
proach [Ton91], we focus in this paper on our re- 
formulation research. Two important reformula- 
tion issues that arise out of the choice of a schema- 
based approach are: 

% Compilability . Can a constraint that does not 
directly match any of a particular subcompiler’s 
schemas be reformulated into one that does? 

• Efficiency. If the efficiency of the compiled 
search algorithm depends on the compiler’s per- 
formance, and the compiler’s performance de- 
pends on the form in which the constraint was 
expressed, can we find forms for constraints 
which compile better, or reformulate constraints 
whose forms can be recognized as ones that 
compile poorly? 

In this paper, we describe a set of techniques we 
are developing for partially addressing these is- 
sues. 

Introduction 

Because we have described KBSDE more extensively 
elsewhere [Ton91], our introduction to the basic ideas 
behind KBSDE will be relatively brief. 


Rooms lengths must be at least ainValusl. 
Rooms widths must be at least ninValu*2 . 
Rooms must be inside the floorplan. 

Rooms must be adjacent to the floorplan 
boundary. 

Rooms must not overlap. 

The rooms must completely fill the floorplan 
i space. 


(MINL) 

(MINW) 

(INS) 

(ADJ) 

(NONOV) 

(FILL) 


Figure 1: Constraints on house floorplans 


Task specifications. KBSDE accepts task spec- 
ifications that can be expressed in the form: 

Synthesize^ : J, o : 0) | P{o) 


where i is the input defining a particular problem, 
o is a candidate solution, O (the type of the object o 
to be synthesized) defines the space of candidate solu- 
tions, and P(o) is a predicate on o that must be sat- 
isfied. P(o) is expressed as a conjunction of simpler 
constraints. 

Many design tasks can be specified in this m - ner. 
- r example, the specification for a simple hous o 0 r- 
inning task might look like: 


Synthesize(< l : house Length, w : houseWidth , 
n : NbrRooms >, fp : Floorplan) 

| Acceptable Floor plan(fp) 

where Acceptable Floorplan(fp) is the conjunction 
of the constraints listed in Figure 1. 

Algorithm descriptions. KBSDE’s top-level clas- 
sification partitions the conjoined constraints in P(o) 
into (mutually exclusive) subsets Pi(o) of constraints, 
where each subset is to be satisfied by a distinct al- 
gorithm component (either a constrained generator, a 
tester, or a hillclimbing patcher). Prototype heuristics 
for assigning constraints to algorithm components are 
discussed in [Ton91]. 

One example of a partitioning of the constraints in 
Figure 1 among a set of algorithm components is: 
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Synthesize(< / : H Length, w : HWidth , 
n : N br Rooms >, /p : Floorplan) 

Generate^ | Af JWI(/p) A MJtfW(/p) 
AJJVS(/p) A ADJ(/p)); 
if Test (/p | WCWOV(/p) A F/XX(/p)) fails 
then Patch (fp | NONOV(fp) A FILL(fp )); 
Return(/p). 


The intended semantics of this syntax is: generate an 
s; if the tested constraints fail, then try to patch; if 
patching fails, chronologically backtrack to the gener- 
ator. Later references to Tests in this paper are in- 
tended to have the same semantics with respect to test 
failure and backtracking to preceding generators. 

Algorithms themselves can be partitioned across sev- 
eral levels of abstraction. For example, 


Synthesize(< / : H Length, w : HWidth , 

n : N br Rooms >,fp : Floorplan) 
Generate(< ai : area(Room), 

a n : area(JZoom) >); 

Test(< a\, ..., a n >| / * w = ai + a* + ... + a„); 
high level 


low level 

For i = 1 to n do 
Generate^,* : Room | MINLfa) 
AMINW(ri) A INS(ri) 

A ADJ(ri) A Area(pi) = a,*); 
rooms(fp) —< r u ...,r„ >; 
if Test(/p | NONOV(fp) A FILL(fp)) fails 
then Patch (fp | NONOV(fp) A FILL(fp)); 
Return(/p). 


Constraints are thus partitioned across levels as well as 
across the algorithm components within each level. In 
addition, inter-level constraints (such as Area(pi) = 
a>i) are dynamically posted to ensure that a solution 
generated at the next level down is a refinement of the 
solution at the current level. 

Reformulating constraints for 
incorporation into generators 

The RICK subcompiler (the subject of Wes Braud- 
away’s Ph.D. work [Bra91b, Bra91a]) of KBSDE spe- 
cializes in incorporating local solution constraints c 
into generators of all instances s of a given type T. 
The result is a constrained generator \ whose computed 
range of values is guaranteed to include only those in- 
stances of T that satisfy c; this is typically accom- 
plished by either changing the lower or upper bounds 
of the range, or pruning out inconsistent values and 
caching the remainder. Note that T can be non- 
numerical, and hierarchical in structure (e.g., a rect- 
angular floorplan is composed of rectangular rooms; 


each room is defined in terms of 4 parameters: < 

>). 

Compilability 

The structure mismatch problem. The generator 
structure of a an algorithm is its internal organization 
of generator components. Different generator struc- 
tures can be constructed for the same task. Some gen- 
erator structures do not allow the incorporation of all 
constraints. We refer to this as the structure mismatch 
problem . 

The structure mismatch problem is actually a family 
of problems, as illustrated in Figure 2, which indicates 
all the activities involved in RICK’s process of design- 
ing a constrained generator. Each such decision ((a) 
through (h)) can be “bollixed” by its own unique struc- 
ture mismatch problem. For example, decision (a) - 
partitioning requirements - takes a set of requirements 
R(s) and decides which will be treated as local, type- 
defining constraints T(s), which will be considered as 
semi-local constraints C(s) on interfacing parts of s, 
and which will be considered global constraints P(s); 
which constraints can be treated as type-defining de- 
pend on the known datatypes. 

The RICK subcompiler avoids the structure mis- 
match problems associated with decision (e) (the 
choice of low-level object structure), decision (g) (the 
choice of composition architecture), and decision (h) 
(the choice of control flow) by using a least commit- 
ment approach to top-down refinement of the genera- 
tors that is constrained by the constraints to be incor- 
porated [Bra91b]. 

Thus for theses decisions, RICK avoids the need for 
a reformulation-based approach to the structure mis- 
match problem. However, RICK does require reformu- 
lation for a particular special case of decision e) which 
we now describe. 

Reformulation to eliminate terminological mis- 
match between constraint and generator. If a 
constraint c refers to an object obj that is semanti- 
cally a dynamically generated part of solution o (e.g., 
the points inside rectangle R) but that does not appear 
syntactically as a part or parameter of o (according 
to the given type definition T), then c cannot be di- 
rectly incorporated into the generator of ail instances 
of type T (since the hierarchical structure of the gen- 
erator procedure is directly lifted from the syntactic 
structure of T). Incorporation is enabled by using re- 
formulation techniques that reexpress c solely in terms 
of the defined parts and parameters of T. 

For example, constraint INS, “Rooms must be inside 
the floorplan”, might originally be expressed as “If a 
point is inside a room, than that point is also inside 
the floorplan”: 

ViZ, Pl{Room(R) A Point(P) A z(stz;(iZ)) < s(P) 

< x(ne(JZ)) A y(su>(R)) < y(P) < t/(ne(iZ))] 
[x(sw(floorplan)) < x(P) < x(ne( floorplan )) A 


155 





Generateis I R(s)) 

Knowledge Level I . . 

Decisions • ( a ) Petitioning Requirements 

• (b) Object Election • (£) Artifact • (i) Concept Formulation 

Architecture 

• (c) Object ' ructnre • (j) Referenced Architecture 

• (k) Terminology 

1 l t 

Synthesis Knowledge T(s) Composition Knowledge C(s) Analysis Knowledge P(s) 


• (d) Generator Order 

• (g) Composition 

Architecture 

• (e) Low-level Structure 

f ’ 

• (h) Control Flow 


Generate 

Components 


Compose 

Components 


Function Level 
Decisions 



Test Artifact 
Candidate 


Figure 2: Design decisions defining a family of structure mismatch problems 


156 





y(sw(floorplan)) < y(P) < y(ne(floorplan))]] 

The problem is that T, the part decomposition for 
objects of type Floorplan, does not include points P 
in the interior of a room. The RICK system uses the 
transitivity of the “<” relation to hypothesize a plau- 
sible reformulation of the above constraint that does 
not refer to points P, and instead, simply constrains 
the extreme points of room R: 


ViE 


Room(R) — ► 

x(sw( floorplan)) < x(sw(R)) < x(ne( floorplan )) A 
y(sw( floorplan)) < y(sw(R)) < y(ne(//oorp/an))A 
x(sw(floorplan)) < x(ne(R)) < x(ne(/Joorplan))A 
y(sw( floorplan)) < y(ne(IE)) < y(ne( floorplan))]] 


RICK then uses a standard theorem- proving technique 
to verify that this hypothesized reformulation is a nec- 
essary condition for the original constraint. 


Reformulation by eliciting simplifying assump- 
tions. Because RICK uses the simplex method to 
check the consistency of a set of constraints (a neces- 
sary step along the way to constructing a constrained 
generator satisfying those constraints), all such con- 
straints must ultimately be expressible in a linear alge- 
braic form in order for compilation to proceed. Some- 
times, however, constraints that could be reformulated 
as linear constraints depend for their reformulation 
upon knowledge not available in RICK’s knowledge 
base. 

For example, in our house floorplanning example, 
floorplans and rooms are defined to be rectangular. 
Rectangles in general need not be aligned horizontally 
or vertically in the Cartesian plane; thus, the type def- 
inition for a general rectangle will have associated the 
nonlinear constraint: 

[x(nu;(R)) - x(sti/(R))] * [x(se(R)) — x(sis(R))] = 
[y(nw(R)) - y(sw(R))] * [y(se(R)) - y(sw(R))] 

However, were RICK to know that: 
x(nu;(R)) = x(su;(fE)) 

i.e., we can consider the rectangle to be vertically 
aligned with the y axis, then because (non-degenerate) 
rectangles also have the associated constraint: 
y(sw(R)) < y(nw(R)) 

the constraint could be reduced to the linear con- 
straint: 


y(se(R)) ~ y(sw(R)) 

i.e., “the rectangle is horizontally aligned with the x 
axis.” 

If at any point, compilation comes to a halt be- 
cause the only constraints left to compile are nonlin- 
ear, RICK consults the user, by presenting a list of 
plausible simplifying assumptions . These simplifying 
assumptions are generated by heuristics that examine 
a nonlinear constraint and consider what would have 
to be true in order for it to be reducible to a linear con- 
straint. Thus if (x - y) appears in a product, “x = y” 


would simplify the constraint if it were true; if xy ap- 
pears in a sum, x = 1/y would simplify the constraint. 

The generation (and select ion /verification by the 
user) of such simplifying assumptions is intended to 
mimic the form of mathematical reasoning, “Without 
loss of generality, let us assume The user is in- 
volved in this process because, in general, actual verifi- 
cation of such simplifying assumptions requires knowl- 
edge that is not available in the system’s knowledge 
base. 

Efficiency improvement 

RICK’s task is to construct, for a given constraint c(o), 
where o is of type T, a constrained generator of objects 
of type T that are guaranteed to satisfy c. Thus, no 
matter how it chooses to represent solutions or incor- 
porate the constraint, if RICK fully incorporates c into 
the generator, the set of solutions generated will always 
be the same. Since RICK does not reason about “low. 
level” issues such as choice of data structure for solu- 
tions, the primary issue regarding efficiency is whether 
the constructed constrained generator - which sequen- 
tially produces all the members of the set {o | c(o)} - 
is producing duplicate candidate solutions in that se- 
quence. 

One reformulation technique used by RICK to help 
reduce the construction of redundant solutions is based 
on knowing that if RICK is passed a constraint of the 
form: 

V*, y[P — ♦ X = y] 

it will “operationalize” this by constructing generators 
for objects x and y, generate one object first (say, x), 
and then construct y to be an exact copy of x. RICK 
avoids this undesirable behavior by looking for such 
constraints, forming their logical contrapositive (ex- 
cept for the type-defining terms), and then reexpress- 
ing the constraint in a canonical form. 

For example, the NO NOV constraint, “Rooms must 
not overlap”, might originally be expressed as: 

VR1, R2, P[room( HI) A room(R2) A point(P) 
AstrictlyInside(P i JE1)A 
strictly I ns ide(P, R2) — > JZ1 = R2] 

The contrapositive is: 

VR1, J22, P[room(R\) A room(JE2) A R1 A point(P) 
AR1^R2- 

~ strictlyInside(P \ JE1)V ~ strictly In$ide(P, /E2)] 
which is then put in canonical form: 

VR1, JE2, P[room(R\) A room(R2) A R1 A point(P) 

A strictlylnside(P) Rl) 

ARl £ R2 strictlyInsidc(P , JE2)] 

Reformulating constraints for more 
efficient function evaluation 

The MENDER subcompiler (the subject of Kerstin 
Voigt’s Ph.D. work [VT89]) of KBSDE specializes 
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in incorporating global constraints c into hillclimb- 
ing patchers; these patchers take a candidate solu- 
tion s that fails test c, and iteratively modifies s, one 
parameter value at a time, in such a way that im- 
provement occurs with respect to an evaluation func- 
tion f. f is constructed by MENDER so that there 
is some value k such that when hillclimbing reaches 
/( s) > J b, c(s) is simultaneously satisfied. For exam- 
ple, the FILL constraint, “The rooms must completely 
fill the floorplan space” is completely satisfied when 
f(s) = H Length ♦ HWidth , where f(s) is the number 
of lxl “unit tiles” in the floorplan that are covered by 
rooms. 

MENDER handles those global constraints that can 
be viewed as resource assignment problems (RAPs) 
involving assigning a fixed set of resource units to a 
dynamically generated set of consumers in a specified 
way. For example, the FILL constraint can be viewed 
as a RAP wherein the the resources are unit tiles in 
the rectangular floorplan, the consumers are rectangu- 
lar rooms and the required assignment is that each re- 
source unit be assigned (at least) one consumer. (Note 
that other constraints such as NONOV and INS en- 
sure that each resource unit is assigned exactly one 
consumer.) 

Compilation is based on a taxonomy of resource 
assignment problem schemas , where what is varying 
acroes the schemas is the nature of the assignment 
(one-to-one, onto, etc.) Associated with each schema 
is a method for constructing an evaluation function ap- 
propriate for that kind of RAP. Thus the schema for 
an “onto” assignment (such as FILL) has an associated 
evaluation function which counts the total number of 
resource units “covered” by consumers in a particular 
state. 

Efficiency improvement 

The RAP schemas can be organized into a specializa- 
tion lattice. More specialized schemas have more con- 
straints on the assignment; because, therefore, more is 
known about such RAPs, they also often have more ef- 
ficiently evaluatable functions. For example, the most 
specialized RAP, where the relation between resource 
units and consumers is “one-to-one” and “onto” (e.g., 
as between unit tiles in the floorplan and unit tiles in 
the room rectangles), can take advantage of the fact 
that all the consumers must have associated resource 
units assigned to them. (The nature of the overall 
search algorithm architecture in which the hillclimbing 
patcher is embedded guarantees that the patcher will 
be passed a candidate solution that satisfies the “one- 
to-one” constraint, though not necessarily the “onto” 
constraint.) It further relies on the common occurrence 
of a strictly hierarchical structure in the consumer or- 
ganization (e.g., the consumer unit tiles are grouped 
into rectangular rooms). On the basis of these facts, 
the associated evaluation function can count the to- 
tal number of “covered” resource units by counting 


the total number of assigned consumer units, which 
is the same as summing the sizes of the (mutually 
exclusive) consumer groups into which the consumer 
units are partitioned. Thus, if the FILL constraint 
were viewed as an instance of this RAP schema, the 
associated evaluation function would add the area* of 
the placed rooms (which are architecturally gauranteed 
to be inside the house and non-overlapping). 

Such specialized schemas match a conjunction of 
constraints (e.g., the most specialized RAP matches 
FILL A INS A NONOV). Initially, a global constraint is 
completely successfully matched against one of the less 
specialized RAP schemas. Each of the specializations 
of the RAP schema constitutes a potential reformu- 
lation opportunity. Such an opportunity is processed 
in a goal-directed fashion, in the sense that domain- 
specific instances of the additional constraints which 
must also be true to match the more specialized RAP 
schema are then sought among the conjuncts of P(o) 
(or proven to be antecedents for P(o)). 

Reformulating constraints for designing 
abstraction levels 

The HiT subcompiler (the subject of Sunil Mohan’s 
Ph.D. work [Moh91]) of KBSDE specializes in divid- 
ing the search algorithm architecture into two or more 
levels (if two, they are called the “base level” and the 
“abstract level”). Each of these levels has an asso- 
ciated search algorithm, configured from (constrained) 
generators, testers, and hillclimbing patchers (see, e.g., 
the earlier two-level example). 

A (generally global) constraint P(s) can serve as the 
basis for constructing an abstraction level in the follow- 
ing sense: An abstraction function mapping solutions 
8 into abstractions f(s) is constructed (e.g., f might ab- 
stract a “room” into a “room area”). An abstract gen- 
erator Generated : input, a:range(f)) can then be con- 
struct e i which ztz crates all elements a in '.ie range 
of f(s) ?(s) can abstracted into test P*>i, which 
is appi: able to aoacract candidate solutions. Thus 
one possible searcn algorithm for the ab stract level is: 

Generate^ : input, a : range(f)); 

Test(a | P'(a)) 


HiT currently is organized around two schemas rep- 
resenting constraint types whose very form makes it 
easy to construct an abstraction function: 

1. Functional constraints: P(F(s)). Based on such a 
constraint, the abstract level generates {z | z = 
F(s)} and the base level is then responsible for en- 
suring that P(refinement(z)) holds. 

2. Disjunctive con- 

straints: Vs, t[solution(s) A pariType(t , *) - Vp : 
T\part(p, s) Pl(p) V P2(p) V ... V Pn(p).j] The ab- 

stract level generator selects one of these disjuncts 
to be true by fiat (i.e., it posts the disjunct as a con- 
straint). The base level must then construct an s 
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that satisfies that disjunct. 

Compilability 

Reformulating a constraint into disjunctive 
form. Several general rules are used to carry out this 
reformulation (some of which are similar to those used 
to transform a predicate logic assertion into conjunc- 
tive normal form). These include: “Move negations in- 
ward”, “Reformulate the expression as a disjunction”, 
and “Remove variables that refer to non-solution ob- 
jects.” 

Using these transformations, a constraint such as 
NONOV: 

For all pairs of rooms R1 and R2, it is not that 
the case there exists a point p that is both inside 
room R1 and inside some other room R2. 

or: 

VR1, R2[Room{Rl) A Room(R2) A Rl ^ R2 

— (3p[Inside(p, Rl) A Inside(p, R2)]) 

is eventually re-expressed as: 

VR1, R2[Room(Rl) A Room(R2) A Rl £ R2 — 
[~L(R1,R2)V~ R(R1,R2) 

V - R(R1, R2)V - A(R1, R2)]] 

where the predicates are defined as follows: 

• L(R1,R2): The x coordinate of the right side of Rl 
is less than or equal to the x coordinate of the left 
side of R2. 

• B(R1,R2): The y coordinate of the top side of Rl is 
less than or equal to the y coordinate of the bottom 
side of R2. 

• R(R1,R2): The x coordinate of the left side of Rl 
is greater than or equal to the x coordinate of the 
right side of R2. 

• A(R1,R2): The y coordinate of the bottom side of 
Rl is greater than the y coordinate of the top side 
of R2. 

At this point, the constraint matches the “disjunc- 
tive constraint schema” and HiT can now proceed to 
construct an abstract level where, for every pair of 
rooms < Rl, R2 >, one of the four above relationships 
is generated as a constraint to be satisfied. 

Efficiency improvement 

Deriving composition laws for the disjunctive 
case. As is readily noticed, picking topological rela- 
tions at random between pairs of rooms is not likely 
to very rapidly converge on an abstract solution that 
is actually concretely realizable. 

Foitunateiy, because the predicates of disjuncts can 
be viewed as defining relations, we can sometimes ex- 
ploit known or provable properties of relations such 
as transitivity, reflexivity, or symmetry. Such proper- 
ties can be viewed more constructively as composition 
rules. Thus for the L(R1,R2) relation, the following 
two composition rules can be shown to hold: 


• Transitivity. If £(R1,R2) and L(R2 J R3), then 
£(R1, R3). 

• No reflexivity. -X(R1,R1). 

These composition laws can then be made opera- 
tional in several ways, including: incorporating them 
into the abstract generators using RICK; using them 
to dynamically prune the ranges of the abstract gener- 
ators; using them as abstract tests (supplementing the 
original test). 

Discussion and conclusions 
Summary 

In this paper, we have briefly described a number of re- 
formulation techniques for use during knowledge com- 
pilation, either to make constraints compilable in the 
first place, or to put them in a form that compiles into 
a more efficient search algorithm. The reformulation 
techniques described here are schema-specific; match- 
ing of a constraint to a given subcompiler’s schema can 
be aided by a reformulation technique, or a constraint 
that (already) matches a particular schema can be put 
in a form (possibly one that matches another schema) 
that will allow it to be more efficiently satisfied. 

Implementation status 

At this point, the reformulation techniques discussed 
for use in the RICK subcompiler have been imple- 
mented; the ones associated with the MENDER and 
HiT subcompilers are the subject of ongoing research. 

Related work 

Antecedent derivation. A schema-specific approach 
to schema-matching is usefully contrasted with a 
general-purpose approach, such as Smith’s antecedent 
derivation method [Smi82]. One difference (we believe) 
is that the “antecedent derivation” process for a given 
schema can be restricted to using a specified (small) 
set of inference rules associated with that schema. A 
second difference is that in some cases, our schema* 
specific reformulation technique is a “normalization” 
process that works in the forward direction (e.g., to 
put a constraint in a disjunctive form). 

DRAT. Like KBSDE, another schema-oriented ap- 
proach that is more specialized than Smith’s an- 
tecedent derivation method is Van Baaien’s DRAT sys- 
tem [BD88J. KBSDE’s target is an efficient generate- 
test-patch architecture; in contrast, DRAT’s target 
is an efficient (object-oriented) forward-chaining the- 
orem prover. Both systems take a classification- 
based approach to assigning specified constraints to 
schemas. However, KBSDE’s schemas correspond to 
generic search algorithm components such as gen- 
erator or patcher types, whereas DRAT’s schemas 
correspond to (efficient implementations for) generic 
forward-chaining rules. 


159 



Ia KBSDE, “incorporating a constraint” in a con- 
strained generator means that the constraint need no 
longer be represented explicitly in the problem solver; 
the generator is guaranteed to only produce accept- 
able solutions. Similarly, in DRAT, “capturing a con- 
straint” in a rule implementation also means that that 
constraint need not be explicitly mentioned in the 
problem solver. KBSDE’s ideal is to incorporate all 
constraints in a single (hierarchically organized) con- 
strained generator , which produces completely correct 
solutions in polynomial time. Since this ideal is sel- 
dom achieved (it would require finding a solution rep- 
resentation in which all constraints are localizable to 
solution parts), KBSDE has a set of fallback strate- 
gies: incorporate the “leftover” global constraints in a 
patcher, in new abstraction levels, or (least preferred) 
in a tester. 

D RAT’s analogue to our reformulation techniques 
for enabling compilability is called concept introduc- 
tion; by considering alternative formulations for one 
of the task’s concepts, DRAT can sometimes find a 
representation that allows more constraints to be cap- 
turable. A process called operationalization then tries 
to capture the “leftover”, uncaptured constraints by 
writing procedures and using these to further special- 
ize the already selected representations. 

Code optimization*. Our reformulations associated 
with efficiency improvement are similar in spirit to 
intermediate code optimization in standard compiler 
technology, in the sense that such optimisations are 
done: (a) at a level of abstraction higher than the tar- 
get level (in our case, reformulating constraints into 
other constraints); and (b) based on a thorough knowl- 
edge of how the compiler to the target level works. 

Research directions 

The set of reformulation techniques presented here is 
under development. Some of the areas still in need 
of further development are: techniques for reformu- 
lating constraints to match RAP schemas; techniques 
for matching the functional constraint schema; tech- 
niques for improving the efficient processing of func- 
tional constraints; and an elaboration of how to best 
exploit derived composition laws for newly constructed 
abstraction levels. 
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Abstract 

Recently there has been increasing interest in the prob- 
lem of knowledge compilation [Selman&Kautz91]. This 
is the problem of identifying tractable techniques for 
determining the consequences of a knowledge base. We 
have developed and implemented a technique, called 
drat, that given a theory , i.e., a collection of first- 
order clauses, can often produce a type of decision pro- 
cedure for that theory that can be used in the place of 
a general-purpose first-order theorem prover for deter- 
mining the many of the consequences of that theory. 
Hence, drat does a type of knowledge compilation. 
Central to the DRAT technique is a type of reformula- 
tion in which a problem’s clauses are restated in terms 
of different nonlogical symbols. The reformulation is 
isomorphic in the sense that it does not change the 
semantics of a problem. 

INTRODUCTION 

Recently there has been increasing interest in the prob- 
lem of knowledge compilation [Selman&Kautz91]. This 
is the problem of identifying tractable techniques for 
determining the consequences of a knowledge base. 
Most interesting knowledge bases are written in high- 
ly expressive languages for which the general problem 
of complete inference is intractable (e.g., at least NP- 
hard, usually undecidable). Even though the general 
inference problem in such a language is intractable, 
given a particular knowledge base, it is often possi- 
ble to identify a tractable inference procedure that is 
complete for the inferences required in that knowledge 
base. 

We have developed and implemented a technique, 
called drat, that given a theory, i.e., a collection of 
first-order clauses, can often produce a type of deci- 
sion procedure for that theory. This type of procedure 
is called a literal satisfiability procedure . Such a satisfi- 
ability procedure for a theory T decides whether or not 
a conjunction of ground literals is satisfiable in T. A 
literal satisfiability procedure for a theory can be used 
in the place of a general-purpose first-order theorem 
prover for determining the many of the consequences 
of that theory. Hence, DRAT does a type of knowledge 



Obviously, we are better off using a satisfiability pro- 
cedure for determining the consequences of a theory 
than we are using a general-purpose theorem prover be- 
cause the satisfiability procedure is guaranteed to halt. 
However, under what circumstances should we consider 
such a procedure tractable? A straightforward way to 
define tractability is polynomial-time worst-case com- 
plexity and for some theories DRAT can produce a sat- 
isfiability procedure that has this property. For many 
other theories, the satisfiability procedures produced 
are exponential in the worst case. Note that DRAT 
can determine whether a satisfiability procedure it pro- 
duces has polynomial or exponential worst-case behav- 
ior. In either case, the procedures are usually much 
more efficient than a general theorem prover because 
the complexity of the theorem prover proving that a 
fact F follows from a theory T is a function of the 
sum of the size of F U T, while the complexity of the 
satisfiability procedure is a function of the size of F. 

Even when DRAT cannot produce a literal satisfia* 
bility procedure for an entire theory it is often an im- 
provement to use a procedure for a subset of an input 
theory because such a procedure can be interfaced with 
a general-purpose theorem prover in such a way that 
the procedure and the theorem prover work together 
to determine the consequences of the theory. 

In practice, so long as a procedure can be found for 
a significant subset of the theory, the resulting infer- 
ence systems are much more efficient than the theorem 
prover alone because many of the inferences that the 
theorem prover would have to do are done more effi- 
ciently by the satisfiability procedure. 

Let V be the set of axioms of a problem and let S 
be the satisfiability procedure that DRAT designs for 
some subset of The theorem prover restricts its 
manipulation of the statements in using S instead 
whenever possible. This paper presents a formalization 
of DRAT and proves that it is complete, i.e., that for any 
first-order statement <t>, if V (= <f>, S combined with the 
theorem prover will prove <fr. We show that drat’s re- 
formulation greatly increases its effectiveness and that 
a solution to a reformulated version of a problem is 
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guaranteed to be a solution to the original problem. 

We present only a brief description of the DRAT algo- 
rithm here. A detailed description of an implementa- 
tion can be found in [VanBaalen89] or [VanBaalen92], 

DRAT was inspired by human problem solving per- 
formance on analytical tasks of the type found on grad- 
uate level standardized admissions tests. An example 
problem is given in Figure 1. 

Given: M, N, 0, P, Q, R, and S are all members of the 
same family. N is married to P. S is a grandchild of Q. 
0 is a niece of M. The mother of S is the only sister 
of M. R is Q’s only child. M has no brothers. N is a 
grandfather of 0. 

Query: Who are the siblings of S? 

Figure 1: The FAMILIES Analytical Reasoning Prob- 
lem 

We analyzed human problem-solving behavior on a 
number of these problems and found the prevalent use 
of diagrams to assist in problem solving. Figure 2 il- 
lustrates the typical diagrams people use to solve the 
problem in Figure 1. 



W R is the only child of Q” “S is a grandchild of Q” 
(Divided rectangles represent couples; circles represent sets 
of children of the same couple: full circles are closed sets, 
broken circles are sets all of whose members may not be 
known; the directed arc represents the “chiliren-of* func- 
tion between couples and their sets of children.) 

Figure 2: Two statements in a representation common- 
ly used by people. 

These diagrams were found to contain a common 
set of structures (across different people and differen- 
t problems). The arcs in Figure 2 are an example of 
such a structure. They represent the 1-1 function be- 
tween a married couple and their set of children. Each 
common structure was also found to have a standard 
set of procedures for manipulating it. For example, 
one procedure associated with the arcs in Figure 2 en- 
sures that they behave like a 1-1 function. It reads 
roughly as, “If two objects are equal and they appear 
at the same end of two separate 1-1 function arcs with 
the same function symbol, the arcs and the objects at 
their other end can be composed.” This procedure is a- 
mong those used to compose the structures in Figure 2 
to yield the diagram in Figure 3. 

People use these diagrams to. test the satisfiability 
of a particular collection of facts by creating the struc- 



Figure 3: Composition of the structures in Figure 3. 

tures representing each fact and then composing them. 
The conjunction is satisfiable just in case no contradic- 
tion is signalled in the composition process. 

DRAT has a library of procedures called schemes . 
These schemes model people’s diagrammatic struc- 
tures and their manipulations. Schemes were found 
to have a number of important properties which are 
described in this paper. Perhaps the most irrrortant 
of * " ese properties is that each scheme turns :.o be 
a; nability procedure. Another important perty 
of s seines is that they can be used as buildi lock- 
s to construct “larger” satisfiability procedure DRAT 
uses this property to construct satisfiability procedures 
for input problems. 

The implementation of DRAT includes the schemes 
found in analyzing the diagrams that people used on 
thirty analytical tasks. It has been tested on twelve 
of these problems stated in a sorted first-order logic. 
The problems vary in size from thirty to sixty sorted 
first-order statements. The performance of the theo- 
rem prover/satisfiability procedure combinations that 
drat produces for these problems was at least two or- 
ders of magnitude better than the performance of the 
theorem prover alone. For example, our general theo- 
rem prover took 988,442 resolutions — three hours and 
five minutes — to solve the problem shown in Figure 1. 
The satisfiability procedure that DRAT produced was 
able to solve the problem entirely without the theorem 
prover and did so in less than three seconds. 

PRELIMINARIES 

Each scheme is a tractable literal satisfiability procedure 
for a theory . 

Definition 1 A theory is a set of statements in first- 
order predicate calculus with equality. 

Definition 2 A literal satisfiability procedure for a 
theory T is a procedure that decides for any conjunc- 
tion of ground literals E whether or not EUT is satis- 
fiable. 

Each scheme is tractable in the sense that, given any E 
containing n literals, the scheme for a theory T decides 
the satisfiability of E UT in time polynomial in n. 

Given a particular E, in addition to determining lit- 
eral satisfiability in some theor " each scheme com- 
putes {« = u I Ii,u e CAE. -= u = vi whem 
C is the set of constant symbol appearing »r. E. A* 
detailed in section , these equalities are communicated 
between schemes in a way that allows the combination 
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of schemes to determine satisfiability for the union of 
their theories. 

One important result of this research is the particu- 
lar library of schemes we have developed from the ob- 
servation of human problem solving of analytical tasks. 
However, in the formal characterization that follows, 
we abstract away from the detail of the current scheme 
library, identifying the properties of schemes required 
for the completeness of DRAT. 

This paper first takes a simplified view of what DRAT 
will accept as an input problem and also assumes that 
drat is only successful if it can produce a satisfiability 
procedure for an entire problem. In this setting, we 
prove that a combination of schemes is a satisfiability 
procedure for the union of the theories of the individual 
schemes. In section , the above restrictions are relaxed 
and it is shown how, in the more general setting, the 
procedures produced by DRAT are interfaced with a 
theorem prover. 

DRAT requires that the formulas of schemes and the 
formulas of an input problem be converted to clauses , 
i.e., disjunctions of first-order literals. The remainder 
of the paper assumes that this has been done. However, 
the presentation will often use more intuitive forms 
for statements, when the conversion to clause form is 
straightforward. 

The restricted definition of a problem taken first is: 


Definition 3 A problem is a triple < E,Tc,$ >, 
where E and <£ are sets of ground literals and Tc is a set 
of clauses each of which contains at least one variable. 
Such a triple is interpreted as a question about whether 
or not for all the ground literals <f> € E U Tc f= <t>. 


Si 
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T Cl = < 


Here is an example problem: 

grand/ ather(O f N), married{N 1 P) 
grandchild(S , Q), niece(O t M), 

mother(S } x) $ister(M,x), 

(sister( M, x) A sister(M , y)) => x = y, 
child(Q , x) x = R, 

-ibrother(M,x ), . . . 

*1 = {sibling(0 1 S),child(N,M)} 

In addition to those axioms shown, Ei also contain- 
s disequalities between ail of the individual constants 
mentioned. Tc, also contains definitions of concepts 
such a grandchild and formulas defining general prop- 
erties of the family relation domain such as symmetry 
of married. 


Given a problem < E,Tc,$ >, DRAT’s objective is 
to design a literal satisfiability procedure for Tc. This 
procedure is used to solve the problem for the partic- 
ular E and To determine whether for some G 
EU Tc (=0, the satisfiability procedure for 7c is used 
to decide whether or not E U Tc U -><£ is unsatisfiable. 
For example, DRAT tries to design a satisfiability pro- 
cedure for 7c, • If successful, the procedure is used 
to decide whether “O” is a sibling of “S” and “M is 
a child of “N” follow from Ei U 7c, by determining 


the satisfiability of Ej U Tc, U ^sibling(0, S) and of 
Ei U T Cl U^child(N t M). 

Obviously, we are better off using a satisfiability pro- 
cedure for Tc to solve a problem < E,Tc,$ > than 
using a general theorem prover because the satisfiabil- 
ity procedure is guaranteed to halt. Perhaps less obvi- 
ous is the fact that these procedures are usually much 
more efficient than a general theorem prover. The intu- 
ition behind this is that the complexity of the theorem 
prover solving the problem is a function of the size of 
the entire problem, while the complexity of the satisfi- 
ability procedure is a function of the size of E U <t>. As 
pointed out in section , this intuition is substantiated 
by the performance of the procedures that DRAT has 
designed. 

THE DRAT TECHNIQUE 

We will call the relation, function and individual con- 
stant symbols in a theory the nonlogical symbols of that 
theory. The nonlogical symbols of each scheme’s the- 
ory are treated as parameters to be instantiated with 
the nonlogical symbols of Tc- For example, the scheme 
Tsymmetrtc whose theory is {R(x, y) => R(y,x)} is pa- 
rameterized by R. 

DRAT tries to find a set of scheme instances that can 
be combined to give a literal satisfiability procedure 
for Tc- Consider a set of scheme instances. Call the 
union of the theories of each scheme instance T/. drat 
has succeeded in finding a satisfiability procedure when 
it finds a T/ that is logically equivalent to Tc- The 
following is an abstract description of this process: 
instances 0 
T/ <— 0 
V c ^Tc 

UNTIL empty(7£) DO 

instance <— choose-instance(T^) 

IF nul\(instance) THEN EXIT-WITH failure 
instances union( instance, instances) 

Tj <— union(theory(instance),T/) 

FOR EACH <t> € T c 

WHEN Ti <t> DO T'c^T'c- <t> 

END FOR 
END UNTIL 

A set of scheme instances is built up incrementally 
and, simultaneously, the set of clauses in V c is paired 
down. Each time choose-instance is invoked, it in- 
spects T f c and chooses a scheme instance whose theory 
is entailed by T r c . After the theory of instance is added 
to T/, DRAT removes clauses from T£ that are entailed 
by T/. 

drat uses the following procedure for computing 
satisfiability in Tj to determine the <t> € V c that follow 
from Tj. For each clause 0, it creates <f > ' by substitut- 
ing a new individual constant for each unique variable 
in <j>. If the satisfiability procedure for T[ reports that 
U Tr is unsatisfiable, T/ J= <f>. 

If the algorithm is exited with T f c empty, drat has 
succeeded in finding a Tj that is equivalent to Tc- To 
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see this, note that V c U Tj = Tc is an invariant of the 
Icv^d. Adding theory(instance) to Tj does not violate 
t! • condition because T' c |= theory(insfance). Re- 
n ving from T* c clauses <f> such that 7/ (= <t> also does 
net violate the condition. 

If the algorithm is exited because choose-instance 
returns nil, it has failed to find a Tj that is equivalent 
to T c . 

Note that this algorithm is nondeterministic be- 
cause, in general, on a call to choose- instance, there 
are several instances from which to choose. The drat 
implementation searches for an appropriate collection 
of scheme instances. This search is reduced consider- 
ably by the fact that scheme instances in 7/ may not 
share nonlogical symbols. As discussed in section , this 
restriction is required to allow schemes to be combined 
by the method described below. More detail on how 
the drat implementation c ontrols this search can be 
found in [VanBaalen92], 

A PROCEDURE FOR COMBINING 
SCHEMES 

Since 7/ is the theory of a set of scheme instances, so 
long as these instances do not share nonlogical symbol- 
s, DRAT has a satisfiability procedure for 7/. This pro- 
cedure is the combination of schemes used to create 7/. 
D RAT's combination technique is the same technique as 
reported by Nelson k. Oppen in [Nelson&Oppen79] and 
a more detailed description than what follows can be 
found there. 

Let £(T) be the set of nonlogical symbols appearing 
in the clauses of T. We will often refer to C(T) as the 
language of T. Consider two scheme instances, 7\ and 
72 , where £(7"i) is disjoint from £(72), and consider a 
conjunction of literals E in £(7\ UTi)- The procedure 
for deciding the satisfiability of E U T\ U 72 begins by 
splitting E into two conjunctions of literals: Ei, with 
literals in £(7\) and E 2 , with literals in £(7^) such that 
the conjunction of literals in Ei and E 2 is satisfiable 
just in case E is. 

When a literal in E contains nonlogical symbols from 
£(Ti UT 2 ), remove each subterm whose function sym- 
bol is not in the language of the head symbol of the 
term. A subterm is removed by substituting a new 
constant symbol for that subterm in the literal and 
conjoining an equality between the term and the new 
symbol with the proper E*. For example, suppose R 
is in £(7i), / is in £(72) and E contains the literal 
R(f(a)). The embedded term is in the wrong language, 
so it is removed. This is done by substituting a new 
constant, say 6, for f(a) in R(f(a)) to obtain R(b) and 
conjoining b = f(a) with E 2 . 

For each literal in E, this technique is applied repeat- 
edly to the right most function symbol in the wrong 
language until the literal no longer contains symbols 
in the wrong language. Then the literal is conjoined 
with the appropriate £,. For instance, R(b) from the 


example above contains no symbols in the wrong lan- 
guage so it is conjoined with £ 1 . 

Next the scheme for 1\ is used to determine th^ satis- 
fiability of Ei UTi . Recall that in so doing, this scheme 
also computes the set of equalities between constants 
in E! that follow from U 7\. Call this set E { . The 
scheme for T 2 is used to determine the satisfiability of 
E 2 U 72 UF 1 . If it is satisfiable, £ 2 , the set of equalities 
that follow from E 2 U 7*2 U E\ y is propagated back to 
Ti, i.e., 7\ is used to compute £1 UTi U £ 2 - 
This propagation of equalities continues until one of 
the schemes reports “unsatisfiable” or until no new e- 
qualities are computed. Note that since there are at 
most n — 1 nonredundant equalities between n con- 
stant symbols, this process will terminate. Unless the 
scheme for T\ or T 2 reports “unsatisfiable,” the proce- 
dure for the combination returns “satisfiable.” 

A complication to this equality propagation pro- 
cedure is that given a set of ground literals, many 
tractable schemes imply disjunctions of equalities be- 
tween constants without implying any of the dis- 
juncts alone, a property called nonconvexity in 
[Nelson&Oppen79]. An example of a convex scheme 
is one that determines satisfiability for the theory of 
equality with uninterpreted function symbols. An ex- 
ample of a nonconvex scheme is one for the theory 
of sets. To see this, note that {a, 6} = {c, d} implies 
a = cVa = d, but does not imply either equality alone. 

A scheme associated with a nonconvex theory must 
compute disjunctions of equalities between constants 
that follow from a given conjunction of ground liter- 
als. The equality propagation procedure is extend- 
ed to handle such schemes by case splitting when a 
nonconvex scheme produces a disjunction. When one 
of the component schemes produces the disjunction 
ci = d\ V * — V c n = d n , the combined satisfiabili- 
ty procedure is applied recursively to the conjunct . ns 
Ei U £ 2 U {ci = di}, . . . , Ei U £2 U { c n — d n } . If au , of 
these is satisfiable, “satisfiable” is returned, otherwise 
“unsatisfiable” is returned. 

As a simple example of this procedure, consider two 
schemes: £ for the theory of equality with uninterpret- 
ed function symbols and S for the theory of finite sets. 
Now consider whether 

y-\ f(a) = {b,g}Af(c) = {d,e}Aa = cA 1 

yg^dAg^tAb^dAb^t J 

is satisfiable. First E is split into 

_[u = cA^^dA^^eA6^dA6^eA 
[ /(a)=c,A/(c) = c 2 
E 2 = [c, = {6,j}Acj = {</,«}]. 

£ is run on Ei and determines that ci = cj. S is 
run on E 2 U {ci = C 2 } which produces the disjunction 
6 = dV& = e. The procedure is now invoked recursive- 
ly for Ei UE 2 U {6 = d} and Ei UE 2 U {b = e}. In both 
calls, £2 produces the disjunction g = dV g = e which 
is unsatisfiable. Therefore, both calls return “unsatis- 
fiable” hence EUf UiS is unsatisfiable. 

We place one additional requirement on schemes 
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to make the equality propagation procedure practi- 
cal. Schemes must be incremental. This means that a 
scheme must be able to save its “state” when a con- 
junction of literals is satisfiable and it must be able 
to use the saved state to determine the satisfiability of 
larger conjunctions at incremental cost. 

REFORMULATION 

The DRAT technique as described in section is severe- 
ly limited by the way in which a problem is stated. 
Often, it is much more successful with an equivalent 
formulation of the problem stated in terms of a dif- 
ferent collection of nonlogical symbols. For instance, 
recall the problem about family relations given in sec- 
tion . It was stated in terms of the binary relation 
child. It turns out that, given the current scheme li- 
brary, the DRAT implementation is much more success- 
ful when the problem is stated in terms of parents , a 
function from an individual to his or her set of par- 
ents. One reason this formulation is better is that the 
library contains a scheme for a theory of fixed sized 
sets. DRAT discovers an instance of this scheme that 
allows it to remove several general clauses from the 
problem including one that limits the size of parent 
sets to two. 

In an effort to circumvent this sensitivity to a prob- 
lem’s formulation, drat 1 is able to reformulate a prob- 
lem in terms of new nonlogical symbols without chang- 
ing the “meaning” of the problem. Choose- instance 
is often able to find scheme instances in reformulated 
problems where it was unable to do so in the initial 
formulations. D rat’s reformulation technique is mod- 
eled after the reformulation that people do in solving 
analytical tasks. For an example of this refer again to 
the problem and diagrams given in section . In the diar 
grams appear concepts such as “married couples” and 
“sets of children of the same couple.” These concepts 
are not present in the initial problem formulation — 
the problem has been reformulated. 

drat does a particular kind of reformulation called 
isomorphic reformulation in [KorfBO]. We formalize 
isomorphic reformulation as a relation between theo- 
ries. 

Definition 4 A reformulation map between U 

wo languages C\ and £ 2 is a function from clauses in 
£1 to sets of clauses in £2. 

Definition 5 A theory T2 is an isomorphic reformu- 
lation of a theory T\ just in case there exists a refor- 
mulation map 'R'C(Ti) c(T 7 ) suc ^ that 
Ti f= 4> O T 2 (= ^£(T.) £(T,)W- for ever y clause t in 
C(T ,). 

If T 2 is an isomorphic reformulation of T \ , any ques- 
tion we have about what clauses are entailed by T\ can 
be answered by theorem proving in T 2 . Given the ques- 
tion, “does T\ j= <£?” we use 1 Z m to translate <f> into 
£(12) and then attempt to prove that T 2 


As a simple example of isomorphic reformulation, 
consider the following two theories: 

f #(*,*), 

T\ = \ R(x,y) => R{y,x) } 

R{x,y) A R(y, z) => R(x } z) 
x £ R-class(x ) 1 

x £ R-class(y) => y £ R-class(x ), 
x £ R-class(y) Ay £ R-class(z) => > 

x £ R-class(z) 

7*2 is an isomorphic reformulation of T\. To show 
this, we exhibit an appropriate R m c ^ x ) c(T 2 )‘ First, 
we introduce the function 7 with 7 (R(x,y)) = x £ 
R-clas$(y) and 7 (~>R(x,y)) = x £ R-class(y). 

The function 7 is also defined in the obvious way 
for literals that are instances of the patterns R(x,y) 
and i.e., given the constants a and 6, 

7 (R(aJ(b))) = a £ R-class(f(b)). 

Given the literals <t>\ n > 1 

^(TuW* 1 V • • • v <6 n ) = h(0i) V • • • V 7(*n)}. 

Now T 2 = Rc(T x ) using the obvious exten- 

sion of IV to sets of clauses. Therefore, 

Ti |= <t> o T 2 b ft* C ( Tl ) £(T 3 )(^)* To see this, note that 
we can take any resolution proof of T\ b 4 > and uni- 
formly apply Rc(Ti)X(t 2 ) to the c ^ auses * n each step 
of the proof to obtain a proof of ^(to £(r 3 )(T l) 1” 

72 £(T,),£(T,)(^)- We Can als ° define ^£(Tj),£(Ti) sim - 

ilarly to R*c(T x ) c{T 2 ) an< ^ use *t to transform any 
proof ^£(Ti) i £(T 3 )(Ti) ^£(7\)>£(t 3 ) W into a proof 

of Ti h <f>. 

ADDING REFORMULATION TO 
DRAT 

One strategy for finding a satisfiability procedure for a 
theory is to identify a theory T 2 with the following 
properties: (1) a satisfiability procedure is known for 
T 2 , (2) we can find a reformulation map R*c{t x ),c{T2) 
demonstrating that T2 is an isomorphic reformulation 
of T\ and (3 ) ^£(Ti),£(t 3 ) ls a computable function. 

The actual DRAT technique is an extension of the al- 
gorithm discussed in section to apply the above strat- 
egy. This extension enables DRAT to generate theories 
that are isomorphic reformulations of Tc while search- 
ing for a set of scheme instances that is a satisfiability 
procedure for Tc. DRAT has a library of reformulation 
rules, each of which is a reformulation map. These 
rules are applied to an input theory Tc to construc- 
t theories that are isomorphic reformulations of Tc. 
The extended algorithm searches for scheme instances 
in these isomorphic reformulations as well as in the 
original Tc. 

Roughly, each reformulation rule is viewed as an ax- 
iom schema that can be instantiated with nonlogical 
symbols and used as a rewrite rule to reformulate a 
theory. To understand this view, consider the follow- 
ing axiom schema in which R is a parameter: 
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R{x,y) o x G Fr(p). 

This states that for any binary relation, there is a pro- 
jection function Fr that is a mapping from individuals 
to sets of individuals such that F R (y) = {x | R(x, y)}. 

DRAT can apply the above reformulation rule to bi- 
nary relations in Tc ■ When the rule is applied to 
R in 7c, the new function symbol Fr is introduced 
and Tc is reformulated in terms of Fr. For instance, 
if this rule is applied to child in the family relation- 
s problem given earlier, it will introduce a function 
that we will call parents , from an individual to his 
or her set of parents. DRAT uses the formula intro- 
ducing parents , i.e., child{x,y) O x E parents(y), to 
reformulate the problem, rewriting all occurrences of 
child(x,y) to x E parents(y). 

This example reformulation rule can be applied to 
any binary relation in any theory. More generally, 
D rat’s reformulation rules are conditional on proper- 
ties of nonlogicai symbols in a theory. A property of 
a nonlogicai symbol is simply a first-order statemen- 
t mentioning that symbol. Before giving the general 
form of reformulation rules, we introduce the function 
rf-symbols(T ), the set of relation and function sym- 
bols of T. The rf-symbols(T) does not contain the 
symbols = or E, even if they are mentioned in T. These 
are treated as special (logical) symbols in the reformu- 
lation process. 

The general form of reformulation rules is given in 
the following definition. 

Definition 6 A triple < P, Q y © «=> ¥ > is a refor- 
mulation rule when it meets the following restriction- 
s: (1) P and Q are conjunctions of clauses (both of 
which may be empty). (2) 0 and are conjunctions 
of literals. (3) rf-symbols(P) C rf-symbol$(G) and 
rf-symbols(Q) C rf -symbols^). (4) rf-symbols(Q) 
is disjoint from r f -symbol s(V). (5) 0 and have the 
same variables. 

Rules are symmetric in the sense that \r bicondi- 
tionals can be used to introduce new symL in “either 
direction.” When the parameters in 0 ar^ instantiat- 
ed with symbols in a theory T, the rule is used to 
reformulate T in terms of the new symbols in ¥. The 
conjunction of clauses P is the condition that must be 
true of a theory for the reformulation rule to be used 
to rewrite © as tf. When the parameters in ^ are in- 
stantiated with the symbols in T, the rule is used to 
reformulate T in terms of the new symbols in 0. In 
this case, Q is the condition that must be true for the 
rule to be used. 

Here is an example of a conditional reformulation 
rule: 

< i x 6 F(y) => F(y) = {*}].. 

[x e F(y) = F'(j/)] > l 

This rule can be applied to any theory T containing 
a function F whose range elements are sets of size one, 

: The symbol X is used in specifying axioms about par- 
tial functions, F(a) = X means that F(a) is undefined. 


i.e., P = [x € F(y) => F(y) = {x}]. When applied, 
the rule reformulates T in terms of a function F' such 
that F'(y) = x just in case x 6 F(y). Q is emp' v in 
this rule because the rule can always be applied m the 
other direction. 

The following is an abstract description of the drat 
algorithm extended to do reformulation: 

instances <— 0 

7 > -0 

T’c^Tc 

IV — 

UNTIL empty(T£.) DO 
EITHER 

ref -pairs <— c hoose- ref- p airs {T ( c ) 

IF null( re /-pairs) THEN EXIT- WITH failure 
symbols , rule «— choos e(ref -pairs) 
instantiated-rule <— instantiate(ru/e, symbols) 
<—7 Z(instantiated-rule,V c ) 
w «— \(t).7Z(instantiated-rule,JZ m i £)) 

stance +— choose-instance(T^) 

: null (instance) THEN EXIT- WITH failure 
instances ♦— union (instance, instances) 

Tj ♦— union(theory(instance),T/) 

FOR EACH <t>€T' c 
WHEN 7> (= 0 DO T c «- V c - <t> 

END FOR 
END UNTIL 

DRAT nondeterministically either chooses a reformu- 
lation rule and reformulates V c or adds the theory of 
the new instance to 7>. Choose- instance identifies 
an instance by identifying properties of the nonlogi- 
cal symbols in Tq. It looks for properties that appear 
in the theories of schemes. For example, when the 
scheme library contains a scheme one of whose axiom- 
s is R'x.y) => R(y,x). drat attempts to choose in- 
stant >f that scheme by looking for binary relations 
in X, .at have the symmetry property. 

Ch. -.3 e-ref -rule ses the identified properties 
of n . uogical symbols in T c to identify reformu- 
lation rules that can be applied to those symbol- 
s. Rules introduce new symbols as explained above. 
Choos e-ref -rules returns a list of < symbols , rule > 
pairs, where symbols is an ordered list of nonlogicai 
symbols. Each pair in the list cam be applied to Tc by 
instantiating the parameters of the rule with symbols. 
For a rule of the form 
< P,C?,0 o ¥ >, 

symbols can either be used to instantiate the param- 
eters in 0 or in but not both. Conditional rules 
are returned only when T* c entails their condition. 
Choos e-ref -rule guarantees that if symbols instanti- 
ates 0 then P follows from T£; If symbols instantiates 
it grantees that Q follows. 

As i >re, if DRAT exits with V c empty, it has suc- 
ceeded i finding a 7/ equivalent to Tc) Otherwise, it 
has fai. 1. 
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Again we have suppressed the issues of search by 
giving a nondeterministic procedure. The search con- 
ducted by the extended algorithm is over a much larg- 
er space than the search conducted by the simple al- 
gorithm described in section . The drat implemen- 
tation with reformulation must compare alternative 
problem formulations. Fortunately, we have found 
some effective heuristics for controlling the search. See 
[VanBaalen89] or [VanBaaIen9l] for details. 

The procedure instantiate, instantiates a rule 
with respect to the nonlogical symbols in sym&o/s to 
produce an instantiated-rule. R is the reformulation 
procedure. We describe this procedure for the case 
where a rule of the form 

< P, Q, $i A • • • A $ n > 

is used to rewrite occurrences of 0\ A • • A 9 n , the from 
conjunct , to occurrences of tf, the to conjunct. The 
procedure for applying the rule in the other direction 
is obtained by reversing the biconditional and replac- 
ing references to P by references to Q. 

Each set of unit clauses in V c of the form 
. . . , (0 rt )<r}, where a is a substitution for the 
variables in the 0,*, is rewritten as the set of unit 
clauses (tf)<7, Each clause containing the literals 
(-•01 )<r, . . . , (0 n )<x is rewritten to contain Af- 

ter all possible occurrences are rewritten, the clauses 
in Q are added to the rewritten theory. 

We call a rewriting produced by R complete when it 
removes all of the nonlogical symbols appearing in the 
from conjunct. R may or may not produce a complete 
rewriting. For example, given a right hand side of the 
form R(f(x)), rewriting will only be complete when R 
and / appear in a theory only in patterns of this form. 
If the rewriting process is not complete, R adds the 
instantiated 0 # to the rewritten theory. 

As an example of applying R, consider again the rule 

< [i € F(y) => F(y) = {*}], , 

[x 6 F{y) oi^1Ai = F'(y)] >. 

As noted, the condition P must follow from a theory to 
reformulate F as F f in that theory. Since the condition 
Q is empty, there are no clauses to add to the resulting 
theory. If the rewriting is not complete, [x ^ J_ A 
x = F'(y) x e F(y)] is added to the rewritten 
theory. Since there is no condition Q , this rule can 
always be used, in the other direction, to reformulate 
F ' as F. In this case, P is added to the rewritten 
theory. Again, the biconditional may need to be added 
to the rewritten theory. 

To ensure that the extended DRAT algorithm gen- 
erates only isomorphic reformulations, each reformula- 
tion rule must be shown to generate only isomorphic 
reformulations. To guarantee this, we require that, 
when instantiated, each reformulation rule be an ex- 
tending definition. 

Definition 7 A reformulation rule < P, Q, 0 <=> 4? > 
is an extending definition if for all theories T the fol- 
lowing conditions hold: 

1. Whenever the rf-symbols(O) C r /-symbol s(T), 


rf -symbols^) is disjoint from rf-symbols(T) and 
T j= P, then every model of T can be expanded to 
a model of T U {0 o tf}. 

2. Whenever the rf-symbols(^) C rf-symbols(T) } 
rf-symbols(Q) is disjoint from rf-symbols(T) and 
T (= Q, then every model of T can be extended to a 
model ofTu{0<* $}. 

Section shows that for any reformulation rule rule , 
\{t).R(rule, f) is a computable function and so long as 
rule is an extending definition, that whenever a theory 
T entails the appropriate condition of rule , R(rule,T) 
is an isomorphic reformulation of T . 

The R * produced by DRAT on the problem < 
£,7c,$ > is the composition of reformulation maps 
used by the algorithm to reformulate Tc . Since each 
reformulation map generates an isomorphic reformula- 
tion, R* (7c) is an isomorphic reformulation of 7c. S- 
ince each step is computable, R * is a computable func- 
tion. 

Finally we point out that, since and 0 in the re- 
formulation rule < P, Q,0 <=> ^ > are required to have 
the same variables, 7£*(£) and 72*(<$) will always be 
ground. However, even though £ and $ are conjunc- 
tions of ground literals, R*(Z) and 7l*($) may not 
be. To see this, suppose that £ contains the literal -<<£ 
and R m {4>) is a conjunction. Then ~^R m (d>) will be a 
disjunction. 

Section shows that when DRAT uses reformulation 
in designing a satisfiability procedure for a problem 

< £,7c } $ > and 7£*(£) is a conjunction of literals, 
the problem can be solved by solving 

< ?£*(£), 7£*(Tc), 7£*(<&) >. The fact that a satisfia- 
bility procedure for a reformulation of a problem re- 
quires 7£*(£) to be a conjunction of literals is not a 
significant difficulty in the more general setting dis- 
cussed in section in which satisfiability procedures are 
used in conjunction with a theorem prover. 

AN EXAMPLE 

In practice, we have found that adding reformulation 
to DRAT increases its effectiveness considerably. We il- 
lustrate this with a relatively simple example excerpted 
from the DRAT implementation design of a satisfiability 
procedure for the example problem given in section . 
We illustrate the implementation’s behavior on the set 
T of clauses: 
married(x , x), 

married(x t y) => married(y,x) 
married(x } y) A married(y , z) =» married(x , z) 
married(y , x) A married(z , x) ^ y — z 
There are three schemes in drat’s library that are 
relevant to the example. The scheme f for the the- 
ory of partial 1-1 functions with parameters F and 
F', which are inverse functions, and theory (F) = {x = 
F(y) A x ^ ± <=> y = F'(x) Ay ^ -L}; The scheme S 2 
for the theory of sets of size two with 5 as a parameter 
and theory(52)= {x x £ S A x? E S A x x ^ xo => 5 = 
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{xi T x 2 } } ; And, the schemed for the theory of equality 
with uninterpreted function symbols. 

The relevant reformulation rules are: 
n =<,,F(x,y) <=> y E Fr(x) > 
r 2 =< IE F(y) => F(y) = {x}, , 

[x € F(y) = ^'(y)] > 

r 3 =< (x £ 1 A y / -L) => x = F(y) Oy= F(x ), , 

[x = F(y) A x ^ 1 F'(y) = {x, y} A x ^ y] > 

As is typical in the implementation, these rules are 
normally used only in one direction. As noted in sec- 
tion , ri reformulates a binary relation in a theory as a 
function Fr onto sets: Fr(x) = {y | F(x,y)}. Also as 
noted in section , when applied to a theory containing 
a function F whose range elements are sets of size one, 
r 2 introduces a function F f such that F'(y) = x just in 
case x £ F(y). The rule r$ reformulates an F that is 
its own inverse as a function F', mapping an individual 
into sets of size r wo such that F'(x) = {x, F(x)}. 

Given the sc’ nes above, DRAT is unable to design 
a satisfiability f rvedure for T without reformulation. 
In an effort to aosign a satisfiability procedure for all of 
T, the DRAT implementation repeatedly reformulates 
the problem, finally producing a formulation in terms 
of a function that we will call couple , mapping an in- 
dividual to the married couple of which he or she is a 
member. 

DRAT uses rule to reformulate T in terms of a 
function that we will call spouses , a mapping from an 
individual to the set of his or her spouses. 7£(ri,T) is 
x £ spouses(x), 

x £ spoiises(y) => y £ spouses(x) 
x £ spouses(y) Ay € spouses(z) => x £ spouses(z) 
y £ spouses(x) A z £ spouses(x) => y = z 

DRAT uses rule r 2 to reformulate 7£(ri,T) in terms 
of a partial function that we will call spouse , a 
mapping from an individual to his or her spouse. 
7£(r2 t 7£(ri,T)) is 
x ^ spouse(x) Vx= 1, 

x = spouse(y) Ax = spouse(x) Ay ^ L 

x = spouse(y) Ax/lAy = spouse(z) Ay ^1 
/ spouse(z) Vi = 1 

y = spouse(x) Ay / lAi = spotise(x) Ar^l 

=> y = z 

Note that the second and fourth clauses in this set 
follow from instances of T and £ respectively. Hence, if 
DRAT were to terminate at this point, T c would include 
only the first and third clauses. 

DRAT uses rule r$ to reformulate the above theory 
in terms of the function couple. The result is 
couple(x) ^ {x, x} V x = x, 
couple(x) = {x, y} A x ^ y => 
couple(y) = {y, x} A y ^ x, 
couple(x) = {x, y} A x ^ yA 
couple(y) = {y,:}Ay^:^ 
couple(x) ^ {x,:}Vx = z, 
couple(y) = {x, y} Ay ^ xA 

couple(z) = { 2 ,x}Az/x=>y = ; 

All of the clauses in this set follow from the com- 


bination of tS 2 and an instance of £ containing the 
uninterpreted function symbol couple . Thus, through 
the use of reformulation, drat succeeds in designing a 
satisfiability procedure for the theory T. Without re- 
formulation it is unable to design a procedure for any 
subset of T. 

STEPS TOWARDS THE 
COMPLETENESS OF DRAT 

This section proves two results towards the complete- 
ness of drat. First, we show that drat designs sat- 
isfiability procedures. If DRAT successfully designs a 
procedure for some set of axioms 7c, then that proce- 
dure can be used to decide the problem < £,7c, $ > 
for any conjunctions of ground literals £ and $. Sec- 
ond, we consider the addition of reformulation to drat 
and show that a satisfiability procedure for 71* (Tc) can 
be used as a satisfiability procedure for 7c so long as 
7£*(E) is a conjunction of literals. These results are 
necessary preliminaries for the proof of completeness 
in section . 

DRAT DESIGNS SATISFIABILITY 
PROCEDURES 

Before proceeding to prove that drat designs satisfi- 
ability procedures, we recall properties of schemes p- 
resented thus far and discuss some additional required 
properties. 

Recall that a scheme for a theory T is a procedure 
that decides the satisfiability of E U T, where E is a 
conjunction of ground literals. Given a particular £, 
each scheme also computes the set of equalities between 
constants in E that follow from E U T. If T is noncon- 
vex, its scheme also computes disjunctions of equalities 
between constants in E that follow from EuT. 

We call a first-order theory whose formulas contain 
no existential quantifiers a quantifier- free theory. An 
additional requirement on ichemes is that their theo- 
ries be quantifier-free. As a practical matter, this is 
not a serious restriction beyond restricting schemes to 
be tractable. See [0ppen8Q] for further discussion of 
this point. 

The theories of schemes are also required to have 
infinite models. The equality propagation technique 
may not work if a theory has only finite models be- 
cause, given a set of constant symbols larger than the 
set of individuals in the model’s domain, such a theo- 
ry implies the disjunction of equalities between those 
constant symbols. Theories with infinite models do 
not imply disjunctions of equalities between variables. 
Therefore, given a theory T with infinite models, such 
disjunctions can only follow from 7UE, for some E 
whose satisfiability is being decided. Any disjunctions 
of equalities between constants that follow must in- 
volve only constants mentioned in E. This restriction 
to theories with infinite models does not appear to be 
significant. To date, we have not found any schemes 
that we could not include because they violated this 
restriction. 
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The theorem proved below is similar to the the- 
orem given in [Nelson&Oppen 79 ]. It differs in the 
addition of the requirement that each scheme’s the- 
ory have infinite models. The theorem appearing in 
[Nelson&Oppen 79 ] is incorrectly stated. The reason a 
different proof is included here is that the proof giv- 
en in [Nelson&;Oppen 79 ] is incorrect . 2 We also include 
our proof because the technique is much more direct 
and serves as a foundation for research in progress to 
extend our results. 

Theorem 1 Let T\ and T 2 be theories with no com- 
mon nonlogical symbols . If there are schemes for Ti 
and T 2 , there is a scheme for T\ U T 2. 

Proof: We prove that the procedure described in 
section for combining two schemes is a scheme for 7 \ U 
To. If the scheme for T\ or T2 reports “unsatisfiability” 
clearly £1 UE 2 UT\ UT2 is unsatisfiable and, since £1 U 
£0 and £ are cosatisfiable, £uTi UT2 is unsatisfiable. 
We must show that if the procedure of section reports 
“satisfiable,” £UTi UT2 is satisfiable. This is done by 
showing how to construct a model of £UTj UT 2 when 
the procedure reports “satisfiable.” 

Let C = {c 0 , ■ . . , c n } be the set of constant symbols 
appearing in £1 or £2. Let E be the set of equalities 
propagated by the procedure of section . As we will see, 
when the procedure halts, E contains all the Ci = c 2 
such that cj , C2 E CAEi U£ 2 UTi UT2 \= ci = C2 . E will 
also contain any equalities chosen when case splitting 
occurs. 

Let E = {ci = c 2 | ct,C2 6 C A c\ = C2 £ E }. Since 
the schemes for T\ and T2 reported “satisfiable,” there 
are models of £1 UT X UE and E2UT2UF. Let Ad 1 and 
M 2 be models of E1UT1UF and £2 UT2UF respectively 
that agree on the interpretation of the equalities in E. 
We show how to construct a model M £ U Ti U T2 
from Mi and Ad 2 . 

Before giving this construction, we show that itjs 
possible to pick an Mi and M 2 that agree on E. 
First note that if E _ is empty, all M\ and Ad 2 agree. 
Now suppose that E is not empty. In this case, there 
exists an Adi and an M 2 that do not satisfy any e- 
quality in E. For suppose to the contrary. In par- 
ticular, suppose that every Ad isatisfies some equality 
in E . If E contains exactly one equality, c\ = c 2 , 
£1 U Ti U E (= ci = c 2 and ci = c 2 E E , not E. If 1 ? 
contains more than one equality, EiUTi UE entails the 
disjunction of equalities in E. But then Ei U Ti U E 
is nonconvex which is impossible because, instead of 
returning satisfiable, the algorithm in section would 
have case split in this situation. This same argument 
can be made for M2 and, hence, there exists_an M2 
that does not satisfy any of the equalities in E . Thus, 
we can choose an M 1 and Adj_that agree on the inter- 
pretation of the equalities in E . 

2 A correct version of the theorem appears in [Neison 84 ], 
however, the proof given there is still incorrect. 


Note that since Adi and M2 agree on the interpre- 
tation of the equalities in E and in E, they agree on 
the interpretation of every equality between constants 
in C. 

Let Adi = < Di, R Xl Fi, C\ >, where D\ is the do- 
main of Adi, R{ is the interpretation of relation sym- 
bols of Adi in Di, Fi is the interpretation of the 
functions symbols of Adi and Ci is the interpretation 
of individual constant symbols in Adi. Similarly, let 
Ad 2 = < Do , R2 , P 2 , C2 > . 

We now construct Ad by merging Adi and M2 as 
follows. The domain of Ad is U D' 2 , where D 2 is 
the domain of Ad 2 ', a modified version of M2- M2 
is obtained by replacing individuals in Do by individ- 
uals in Di when they are designated by the same con- 
stant symbol. For all constant symbols c E C, re- 
place every occurrence of C 2 (c) in D 2 by Ci(c), i.e., 
C 2 (c) — C'i(c) when c is a shared constant symbol 
and C 2 (c) = C 2 (c) otherwise. For all R in the do- 
main of R2 } let R 2 (R) be the set P 2 (^) modified by 
the above replacement procedure. Similarly, let f 2 be 
the new interpretation of the function symbols of Ad 2 . 
M 2 '=<D f 2 ,Rf 2 l F^C f 2 >. 

Ad 2 and M2 are isomorphic structures because Adi 
and Ad 2 agree on the interpretation of every equality 
between constants in C. If Adi and M2 did not agree, 
then Ad 2 and M2 would not be isomorphic. For sup- 
pose, that Ad i|= Ci s= c 2 but Ad 2 ^= ci = c 2 . Then the 
two constant symbols designate the same individual in 
D 2 and different individuals in Z ) 2 and, hence, M2 is 
not isomorphic to Ad 2 . 

To finish the construction of Ad , we take Ad =< D 1 U 
D^Ri U R!2,F x UF^Ci UC ' 2 > Since Adi(= £1 U T x 
and M2 t= £2 U T 2 , Ad(= £ t U £ 2 U Ti U T 2 . Since 
£1 U E 2 and £ are cosatisfiable, Ad|= £ U Ti U To and 
the proof of the theorem is complete. □. 

The fact that DRAT designs satisfiability procedures 
is a direct consequence of theorem 1 . Since the result of 
combining two schemes is again a scheme, any number 
of schemes can be combined by this method. 

DRAT DOES ISOMORPHIC 
REFORMULATION 

This section includes the proofs of two properties of 
drat’s reformulation procedure 71 . These results are 
sufficient to show how a satisfiability procedure gen- 
erated by DRAT for some reformulated theory can be 
used to solve the original problem. 

Lemma 1 If a reformulation rule (rule) is an extend- 
ing definition in T of the form < P, Q,Q V > and 
T [= P, then 1 £(ru/e,T) is an isomorphic reformula- 
tion ofT. 

Proof: The condition that must be met is that if 
ThFJM^ 7l(rule f T) (= ft(ru/«, <£), 
for any clause <j> € C(T). We prove the equivalent fact 
that if T P, 
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SAT(TU {-0}) O SAT(7l(ruie,T) U ^(ru/e,0)), 
where SAT(T) means that T is satisfiable. 

[=>] IfSAT(TU{-0}), SAT{Tu{eo*}u{^<p}) 
because, by the definition of extending definition, every 
model of T can be extended to a model of TU{© O 'P}. 
Therefore, there exists a model of Tu{0 <=> #}u{-0}. 
But 

T U {0 <=> ^ } U {— 0} 7£(rti/e, T) U -7£(ru/e, 0). 

Hence every model of T U {0 O #} U {-0} is a model 
of 7l(rule, T) U -7 Z(rule,6). Since there exists a mod- 
el of r U {0 <=> *} U {-0}, there exists a model of 
7£(ru/e,T)U -7£(ru/e,0) and hence, it is satisfiable. 

[<=] The proof in this direction is similar, with the 
added step of showing that every model of 7Z(rule, T) U 

7Z(rule , 0) can be extended to a model of 7Z(rule,T)U 
{0 <=> U — '75 l(rule,<f>). Since rule is an extending 
definition, every model of a theory T\ that entails Q 
can be extended to a model of T\ U {0 <=> #}. By 
the definition of 71, the clauses of Q will appear in 
7v(ru/e,T) and hence 7 Z{rule,T) |= Q. Therefore, ev- 
ery model of 7Z(rule, T) can be extended to a model of 
7£(ru/e,r)U {0 <=> *£}. Thus, if 7Z(rule,T) U -7£(0) is 
satisfiable, so is T U {->0}. □ 

It follows directly from this lemma and the fact that 
extending definitions can be used in either direction, 
that a reformulation rule (P A Q) => [0 O #] with 
the rf -symbol $(V) instantiated in term of a theory T 
can be used to reformulate T in terms of © so long as 

Lemma 2 For any reformulation rule (rule), the 
function \{t).7Z{rule,t) is computable. 

Proof: Suppose the biconditional of rule is © 
and 71 applies rule to rewrite occurrences of V to 
occurrences of © in T, as described in section . S- 
ince rf-symbols(Q) are disjoint from r f -symbol s(T), 
a rewrite step can never introduce a pattern of liter- 
als to which rule can be applied a second time. The 
rewrite is applied repeatedly until one of the following 
events occurs: (1) all of the symbols in rf -symbols^) 
are removed from T or (2) no new occurrences of ^ 
can be found, even though symbols in rf -symbols^) 
are still present. In either case, repeated application 
of the rewrite rule terminates. Hence, \(t).H{rule,t) 
is computable. □ 

The two preceding lemmas are sufficient to show 
that a satisfiability procedure for 7Z*(Tc) can be 
used to solve the problem < E,7c,$ >, so long 
as 7£*(E) is a conjunction of ground literals. As- 
suming that 7 £*(£) is a conjunction, the satisfiabili- 
ty procedure is used to solve the problem by solving 
< 7£*(E),7£*(7c), 7Z*($) > as follows. For each 0 6$, 
if — »75 is a conjunction of literals, we use the pro- 
cedure to determine if 7£*(E) U 7Z*(Tc) U -»7£*(0) is 
unsatisfiable. Th - is the case if and only if E U7c U ^0 
is unsatisfiable. ' -7Z*(<j>) is a disjunction of literals, 
the procedure is ed to determine the satisfiability of 
THE) U Tv* (7c) o /, for each literal / G -7 l*(0). If 


any of these is satisfiable, 7Z*(E) U TV (7c) U -71* (0) 
is satisfiable; otherwise it is unsatisfiable. 

THE COMPLETENESS OF DRAT 

Two simplifying assumptions were made in the previ- 
ous sections. First, in definition 3, it was assumed that 
a problem for DRAT was of a restricted form. Second, 
it was assumed that DRAT’s success depended on de- 
signing a satisfiability procedure for all of T c . Both of 
these assumptions are now relaxed and we show how a 
literal satisfiability procedure is interfaced with a res- 
olution theorem prover in such a way that the proce- 
dure/theorem prover combination is complete. 

A problem for DRAT is now taken to be a pair < 
T, 0 >, where T is a set of first-order formulas and 0 
is a first-order formula. A pair < T,0 > is interpreted 
as the question, T \= 0?” 

As a typical preprocessing step for resolution theo- 
rem prc^ ng, T ind —0 are converted to sets of clauses 
which w be called T and -0' respectively. Let Tc 
be the A nonground clauses in T'. As before, DRAT 
is used u design a literal satisfiability procedure for 
Tc . However, instead of exiting with failure if it is 
unable to design a procedure for all of Tc , it returns 
the satisfiability procedure and T ' c , those clauses not 
incorporated into the satisfiability procedure. Also, as 
before, drat returns the reformulation map 71 * . 

The algorithm given in section refers to the set of 
clauses for which a literal satisfiability procedure has 
been designed as Tj. Here that procedure is referred 
to as We show how Sr r is used along with a 

resolution theorem prover to demonstrate the unsat- 
isfiability of Cl = 72.-(r / ) U 7£*(-0'). The nonground 
clauses of Cl are manipulated by the theorem prover in 
the usual way, except that clauses in 7/ are prohibited 
from resolving with ground clauses. These resolutions 
are unnecessary because 5r r is a “compression” of any 
resolution steps that can result from such a resolvant. 

Sr t is used in the manipulation of ground clauses in 
Cl and ground clauses derived from Cl during theo- 
rem proving. It is interfaced to the theorem prover via 
theory resolution[Sticke\S5\. One type of theory reso- 
lution, called total narrow theory resolution, requires 
a decision procedure for a theory 7*, given a set of lit- 
erals L, to compute subsets U of L such that V UT 
is unsatisfiable. Such a procedure is used to compute 
T-resolvants of a set of clauses as follows. Consider the 
decomposition of the clauses into A'< V L,, where each 
Ki is a single literal in C(T) and Li is disjunction of lit- 
erals (possibly empty). For each subset of the A',, say 
{K il , . . . , I<i H }, that is unsatisfiable in T, the clause 
L\ V • • • V L n is a T-resolvant. 

The theorem prover constructs 7/-resolvants from 
ground clauses, using 5r f to compute sets of ground 
literals that are unsatisfiable in 7/. Let GrL be the set 
of ground unit clauses in Cl and let GrCl be the set of 
ground nonunit clauses in Cl. First, the ground clauses 
are separated into clauses that are in £(7/) and clauses 
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that are not. This is accomplished for the clauses in 
GrL using the procedure described in section ; It is 
accomplished for clauses in GrCl in a similar fashion. 

If a ground clause cj contains a literal that is not 
in £(T/) and a ground clause c 2 contains the negation 
of that literal, the theorem prover computes the re- 
solvant of C! and c 2 in the normal way. 77-resolvants 
are computed using Sr r to compute sets of ground lit- 
erals that are unsatisfiable in T/ as follows. Let GrLji 
be the set of literals in GrL that are in £(7/). Let 
GrClTj be the set of literals in £(77) appearing in 
clauses of GrCl. We input progressively larger subsets 
of GrLits = GrL Tj U GrClxj to Stj as long as those 
sets are satisfiable in Tj, Once a set is unsatisfiable in 
7/, all supersets of it will also be unsatisfiable. When 
the theorem prover deduces a new ground literal in 
GrLr n it is added to GrLits. The smallest subsets 
of GrLits found to be unsatisfiable in 7/ are used to 
compute 7/-resolvants of ground clauses. 

Theorem 2 Given the problem < >, let Sxt be 

a literal satisfiability procedure for 7/ C 7£*(T). If 
T |= f>, Stj combined with the theorem prover will 
demonstrate the unsatisfiability of Cl. 

Proof: In [Stickel85], Stickel shows that, given a set 
of clauses A',- V 7, , if a decision procedure for a theory 
T computes all subsets AT, that are minimally unsatis- 
fiable in T, total narrow theory resolution is complete. 
We must show that the above procedure for computing 
7/-resolvants computes all subsets of GrLits that are 
minimally unsatisfiable in 7/. Clearly, so long as S Tl is 
a literal satisfiability procedure, the above procedure 
computes all these subsets. Thus, the completeness 
result follows directly from the results of section . □ 

The procedure described above can be made much 
more efficient. There are several refinements used by 
the drat implementation to consider far fewer subset- 
s for unsatisfiability in 7/. We discuss two of these 
here. One refinement is to distinguish between literals 
in GrLxt and GrClr r First, we consider the satisfi- 
ability of GrL'pj . If this is unsatisfiable, we are done. 
Otherwise, we consider progressively larger sets of lit- 
erals appearing in clauses in GrClr l . For each such set 
s, $Tj is used to determine whether or not GrLr t U s 
is unsatisfiable in T/. 

Note that the subsets identified with this refinement 
are not always minimal: it is possible for a subset of 
G rC I t[ union a subset of GrLr t to be unsatisfiable in 
T[. However, it turns out that completeness of theory 
resolution is retained in this case, since the extraneous 
literals are in GrLxi and, therefore, are unit clauses. 

A second simpler refinement only considers subsets 
ofGrC7 T/ each of whose elements appears in a different 
clause in GrCl. 

As a final point about the efficiency of the procedure 
for computing subsets that are minimally unsatisfiable 
in 7/, recall that schemes are required to be incremen- 
tal. Because of this, Stj is used very efficiently to 
consider progressively larger sets of literals. 


It is often most effective to leverage the use of Sr, 
by doing as much of the theorem proving as possible 
at the “ground level.” The DRAT implementation uses 
“set of support” strategy which is very effective in ac- 
complishing this when is ground because it tends 
to produce ground resolvants. 

Summary and Ongoing Work 

We have presented a formalization of drat: a tech- 
nique for automatic design of satisfiability procedures. 
We have shown how these procedures are interfaced to 
a theorem prover so that it can, in many cases, prove 
theorems more efficiently. Given the set of axioms 
of a problem, and Sy, a literal satisfiability procedure 
designed for C we have proven that for any first- 
order statement </>, if ^ <f > , the theorem prover/5*/ 

combination will prove <f>. 

The major steps of our argument were as follows: 

1. We showed that a combination of satisfiability proce- 
dures with certain properties is again a satisfiability 
procedure. 

2. We showed that the reformulation that is essential 
to DRAT’s effectiveness is isomorphic reformulation 
and, therefore, a satisfiability procedure of a refor- 
mulated theory can be used to solve problems in the 
original theory. 

3. We proved the completeness of our technique for 
combining literal satisfiability procedures with a the- 
orem prover. In this combination, Sy* is used to 
compute ^'-resolvants from ground clauses and the 
theorem prover is restricted so that it does not re- 
solve ground clauses on literals in £($'). 

In our ongoing work, we are attempting to extend 
D rat’s scheme combination technique. As much as 
possible, we would like to remove the restriction on 
the sharing of nonlogical symbols between componen- 
t scheme instances in combinations. We are exploring 
the conditions under which limited types of overlap be- 
tween nonlogical symbols is allowed. When overlap is 
allowed, component schemes must propagate more in- 
formation than just equalities between constant sym- 
bols. In most cases where overlap is allowed and in 
which the schemes propagate at least the set of equal- 
ities between constants, it is not difficult to show the 
completeness of a propagation technique. The major 
issue that arises is proving that the propagation termi- 
nates. 

As an example, consider allowing two schemes to 
share function symbols. The schemes must propagate 
all equalities between ground terms involving shared 
function symbols. The proof technique used in section 
can be extended to prove that such schemes combined 
by an appropriately extended propagation technique 
will produce semi-decision procedures for the combi- 
nations of their theories. However, in general, it is not 
possible to prove that the propagation will terminate. 
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One situation in which overlap is allowed occurs 
when the theories of schemes are sets of clauses in a 
sorted first-order logic. In this case, a function symbol 
F whose range is disjoint from its domain can be shared 
between schemes because terms of the form F(F(x)) 
are not well formed and, hence, it is easy to show that 
propagation of terms involving F will terminate. 
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This article summarizes some aspects of the research carried out in the Center of Electomagnetics 
Research, Northeastern University, in the context of the Sensor/Data Fusion project, under the supervi- 
sion or Prof. M. M. Kokar; our interest in change of represnetation originates by our application needs. 
Related issues of our research are briefly discussed in the following paragraphs. 

Sensor/Data Fusion (SDF) is an application driven discipline, facing the problem of associating, 
combining and compromizing among various information sources. An information source can be a 
physical sensor, data from data/knowledge bases, or an external constraint or guidance either from the 
human or frome another system. Sensor/Data Fusion can be included in the more general field of Sensory 
Information Processing, where the term “sensor” is used in a broad sense. 

Our research in the area of SDF lies in between theory and practice. The main goals set are the 
following: 

1. Develop a systematic approach (a methodology) for designing, implementing, testing and main- 
taining sensor/data fusion systems that can: 

(a) interpret sensory information, 

(b) reason about situations using this information plus inputs from data bases (collections of 
known facts) and knowledge bases (collections of rules and heuristics) and other SDF systems. 

2. Through theoretical and experimental investigations develop a theoretical framework in which 
designs of such systems can be formulated and analyzed. 

The approach taken in this research is goal driven: first propose design requirements and specifi- 
cations, then identify theoretical questions needed to be answered in order to carry out the proposed 
design, and then investigate these questions in a systematic way. 

The prevailing idea in our conceptualization of a SDF system is to follow: data should be dynamically 
transformed, processed and represented into suitable forms, so as to facilitate combining and reasoning 
about them. To restrict the complexity issue of a Sensor/Data Fusion System, we have proposed a 
layered architecture [KZR91]. In establishing a good definition for the layers two factors are considered: 
the semantics of the data, and the form of representation used. To transform data from one layer to 
another a process of abstraction must be invoked; on the other hand a fusion process combines data of 
the same layer into data of the same or adjacent layers. While the system requirements of autonomy, 
fidelity, consistency, and versatility must be borne in the designer’s mind, the main design issues are: 


• what are the layers for a particular class of domains, 

• what are the abstraction and fusion mechanisms, 

• what should be the structure within the layers, 

• how would the system dynamically choose among alternative processing paths. 
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An initial approach to guide the design of a Sensor/Data Fusion System makes use of logic [KZ92]. 
This provides for an idealized framework, which should be borne in mind in the design of a SDF system; 
however, in an actual design, issues of noise and uncertainty must be also dealt with. From the logical 
standpoint, we have to deal with theories and their models ; collectively, they represent formal systems. 
Theories are collections of statements about the world expressed in a particular formal language (language 
of the theory). A theory includes a set of inference rules which are used for deriving new statements 
through the derivation process (theorem proving). Associated with a theory and a model is a set of 
mappings ( interpretation rules) which assign correspondence between the theory and the model. This 
correspondence must fulfill the adequacy postulate - whatever statement can be derived in the theory, it 
can be shown to be true in the model. The model in our case is a set of data (including sensory data) 
about the world together with a set of operations on the data. Instead of deriving statements about 
the world using the theory, one can check whether the statement is fulfilled in the model; this process 
is called model checking. It is an open research question whether and when it is better to use theorem 
proving and when to use model checking. We intend to investigate such a question for the domain of 
sensor/ data fusion. 

The requirement of deriving consistent sets of conclusions must incorporate both models (real data 
about the world) and theories. In order to preserve consistenc% when implementing SDF system layers 
we need to make sure that both the abstraction process and t..i fusion process fulfill this requirement. 
As a consequence of this, in our logical framework we could formulate the following definition of fusion: 

Fusion is a process of combining two formal systems (two models and two theories) into a new formal 
system in such a way that we obtain a new formal system, i.e., such that the requirement of logical 
adequacy is fulfilled. 

Along with the theoretical investigations that we have carried out until now, we are designing and 
implementing a prototype experimental system [KZ91], which is composed of a number of layers (cur- 
rently two), incorporating abstraction and fusion processes. We are experimenting with this system 
on a chosen domain and examining the results of abstraction and fusion processes at each layer. This 
will allow us to formulate guidelines on the payoffs of doing abstraction vs. fusion in particular layers, 
choosing an appropriate layer for fusing data, or performing fusion dynamically in a number of layers, 
and combining/fusing the results. 
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